From Zero to RAGs. Balancing job-NER performance with Token Cost
| dc.contributor.author | Moser, Denis | |
| dc.contributor.author | Dornberger, Rolf | |
| dc.contributor.author | Hanne, Thomas | |
| dc.contributor.editor | Shukla, Samiksha | |
| dc.contributor.editor | Sayama, Hiroki | |
| dc.contributor.editor | Tiwari, Kapil | |
| dc.contributor.editor | George, Jossy Paul | |
| dc.contributor.editor | Kureethara, Joseph Varghese | |
| dc.date.accessioned | 2026-06-04T11:48:13Z | |
| dc.date.issued | 2026 | |
| dc.description.abstract | We investigate prompt-optimization strategies for domain-specific named entity recognition in job advertisements by balancing the extraction performance against the number of tokens. Using the SKILLSPAN corpus, we implement six pipelines combining three prompting methods, zero-shot, hard-coded few-shot, and dynamic RAG-based few-shot, with optional RAG-based semantic prefiltering. Each pipeline extracts skills and via GPT-4o-mini, measuring F1, precision, recall, and average tokens per advertisement. The results show that dynamic RAG-few-shot without prefiltering achieves the highest F1 (≈71% for knowledge, ≈60% for skills) and that prefiltering might reduce token usage by up to 70% while modestly lowering recall. Compared to zero-shot, few-shot prompting, especially with RAG retrieval, yields substantial recall gains of up to 28% at the cost of precision. Our findings demonstrate that RAG-augmented few-shot prompting offers an effective, token-efficient solution for specialized NER tasks. | |
| dc.event | International Conference on Data Science for Computational Security (IDSCS 2025) | |
| dc.event.end | 2025-11-15 | |
| dc.event.start | 2025-11-14 | |
| dc.identifier.doi | 10.1007/978-3-032-24075-0_3 | |
| dc.identifier.isbn | 978-3-032-24074-3 | |
| dc.identifier.isbn | 978-3-032-24075-0 | |
| dc.identifier.uri | https://irf.fhnw.ch/handle/11645/56911 | |
| dc.language.iso | en | |
| dc.publisher | Springer | |
| dc.relation.ispartof | Data Science and Security. Proceedings of IDSCS 2025, Volume 2 | |
| dc.relation.ispartofseries | Lecture Notes in Networks and Systems (LNNS) | |
| dc.rights.uri | ||
| dc.spatial | Bangalore | |
| dc.subject.ddc | 005 - Computer Programmierung, Programme und Daten | |
| dc.title | From Zero to RAGs. Balancing job-NER performance with Token Cost | |
| dc.type | 04B - Beitrag Konferenzschrift | |
| dc.volume | 2 | |
| dspace.entity.type | Publication | |
| fhnw.InventedHere | Yes | |
| fhnw.ReviewType | peer-reviewed | |
| fhnw.affiliation.hochschule | Hochschule für Wirtschaft FHNW | de_CH |
| fhnw.affiliation.institut | Institut für Wirtschaftsinformatik | de_CH |
| fhnw.openAccessCategory | Closed | |
| fhnw.pagination | 25-37 | |
| fhnw.publicationState | Published | |
| fhnw.seriesNumber | 1945 | |
| fhnw.targetcollection | d40e4c67-dd87-4d14-8518-b2f0a855e750 | |
| relation.isAuthorOfPublication | 258b09e5-9f15-4a4b-95e2-073ce8673c74 | |
| relation.isAuthorOfPublication | 64196f63-c326-4e10-935d-6776cc91354c | |
| relation.isAuthorOfPublication | 35d8348b-4dae-448a-af2a-4c5a4504da04 | |
| relation.isAuthorOfPublication.latestForDiscovery | 258b09e5-9f15-4a4b-95e2-073ce8673c74 |
Dateien
Lizenzbündel
1 - 1 von 1
Lade...
- Name:
- license.txt
- Größe:
- 2.66 KB
- Format:
- Item-specific license agreed upon to submission
- Beschreibung: