From Zero to RAGs. Balancing job-NER performance with Token Cost

dc.contributor.authorMoser, Denis
dc.contributor.authorDornberger, Rolf
dc.contributor.authorHanne, Thomas
dc.contributor.editorShukla, Samiksha
dc.contributor.editorSayama, Hiroki
dc.contributor.editorTiwari, Kapil
dc.contributor.editorGeorge, Jossy Paul
dc.contributor.editorKureethara, Joseph Varghese
dc.date.accessioned2026-06-04T11:48:13Z
dc.date.issued2026
dc.description.abstractWe investigate prompt-optimization strategies for domain-specific named entity recognition in job advertisements by balancing the extraction performance against the number of tokens. Using the SKILLSPAN corpus, we implement six pipelines combining three prompting methods, zero-shot, hard-coded few-shot, and dynamic RAG-based few-shot, with optional RAG-based semantic prefiltering. Each pipeline extracts skills and via GPT-4o-mini, measuring F1, precision, recall, and average tokens per advertisement. The results show that dynamic RAG-few-shot without prefiltering achieves the highest F1 (≈71% for knowledge, ≈60% for skills) and that prefiltering might reduce token usage by up to 70% while modestly lowering recall. Compared to zero-shot, few-shot prompting, especially with RAG retrieval, yields substantial recall gains of up to 28% at the cost of precision. Our findings demonstrate that RAG-augmented few-shot prompting offers an effective, token-efficient solution for specialized NER tasks.
dc.eventInternational Conference on Data Science for Computational Security (IDSCS 2025)
dc.event.end2025-11-15
dc.event.start2025-11-14
dc.identifier.doi10.1007/978-3-032-24075-0_3
dc.identifier.isbn978-3-032-24074-3
dc.identifier.isbn978-3-032-24075-0
dc.identifier.urihttps://irf.fhnw.ch/handle/11645/56911
dc.language.isoen
dc.publisherSpringer
dc.relation.ispartofData Science and Security. Proceedings of IDSCS 2025, Volume 2
dc.relation.ispartofseriesLecture Notes in Networks and Systems (LNNS)
dc.rights.uri
dc.spatialBangalore
dc.subject.ddc005 - Computer Programmierung, Programme und Daten
dc.titleFrom Zero to RAGs. Balancing job-NER performance with Token Cost
dc.type04B - Beitrag Konferenzschrift
dc.volume2
dspace.entity.typePublication
fhnw.InventedHereYes
fhnw.ReviewTypepeer-reviewed
fhnw.affiliation.hochschuleHochschule für Wirtschaft FHNWde_CH
fhnw.affiliation.institutInstitut für Wirtschaftsinformatikde_CH
fhnw.openAccessCategoryClosed
fhnw.pagination25-37
fhnw.publicationStatePublished
fhnw.seriesNumber1945
fhnw.targetcollectiond40e4c67-dd87-4d14-8518-b2f0a855e750
relation.isAuthorOfPublication258b09e5-9f15-4a4b-95e2-073ce8673c74
relation.isAuthorOfPublication64196f63-c326-4e10-935d-6776cc91354c
relation.isAuthorOfPublication35d8348b-4dae-448a-af2a-4c5a4504da04
relation.isAuthorOfPublication.latestForDiscovery258b09e5-9f15-4a4b-95e2-073ce8673c74
Dateien

Lizenzbündel

Gerade angezeigt 1 - 1 von 1
Lade...
Vorschaubild
Name:
license.txt
Größe:
2.66 KB
Format:
Item-specific license agreed upon to submission
Beschreibung: