Novel benchmark data set for automatic error detection and correction

Masanti, Corina; Witschel, Hans Friedrich; Riesen, Kaspar

Novel benchmark data set for automatic error detection and correction

Autor:innen

Masanti, Corina

Witschel, Hans Friedrich

Riesen, Kaspar

Autor:in (Körperschaft)

Publikationsdatum

2023

Typ der Arbeit

Studiengang

Sammlung

Institut für Wirtschaftsinformatik

Komplettanzeige

Typ

04B - Beitrag Konferenzschrift

Herausgeber:innen

Reiff-Marganiec, Stephan

Herausgeber:in (Körperschaft)

Betreuer:in

Übergeordnetes Werk

Natural Language Processing and Information Systems

Themenheft

DOI der Originalpublikation

https://doi.org/10.1007/978-3-031-35320-8_38

URI

https://irf.fhnw.ch/handle/11654/56303

Link

Zugehörige Forschungsdaten

Reihe / Serie

Lecture Notes in Computer Science

Reihennummer

13913

Jahrgang / Band

Ausgabe / Nummer

Seiten / Dauer

511-521

Patentnummer

Verlag / Herausgebende Institution

Springer

Verlagsort / Veranstaltungsort

Cham

Auflage

Version

Programmiersprache

Abtretungsempfänger:in

Praxispartner:in/Auftraggeber:in

Zusammenfassung

The present paper introduces a novel benchmark data set for automatic error detection as well as error correction in text documents based on language models or other techniques. The data set contains a large number of sentences from various domains annotated with various types of errors (orthographic, grammatical, punctuation, and typography errors). The paper presents the method used to collect and annotate the documents, provides statistical analyses of the data set’s properties and evaluates two preliminary baseline models for automatic error detection on a specific benchmark task. The results show, on the one hand, the effectiveness of the proposed data set for the evaluation of automatic error detection systems. On the other hand, these initial analyses also reveal that the data set contains challenging cases that are difficult to detect. Finally, the paper discusses potential applications of the proposed data set in the development and research of error detection and error correction systems.

Schlagwörter

Fachgebiet (DDC)

330 - Wirtschaft

Projekt

Veranstaltung

28th International Conference on Applications of Natural Language to Information Systems

Startdatum der Ausstellung

Enddatum der Ausstellung

Startdatum der Konferenz

21.06.2023

Enddatum der Konferenz

23.06.2023

Datum der letzten Prüfung

ISBN

978-3-031-35319-2

ISSN

Sprache

Englisch

Während FHNW Zugehörigkeit erstellt

Ja

Zukunftsfelder FHNW

Publikationsstatus

Veröffentlicht

Begutachtung

peer-reviewed

Open Access-Status

Closed

Lizenz

Zitation

Masanti, C., Witschel, H. F., & Riesen, K. (2023). Novel benchmark data set for automatic error detection and correction. In E. Métais, F. Meziane, V. Sugumaran, W. Manning, & S. Reiff-Marganiec (Eds.), Natural Language Processing and Information Systems (pp. 511–521). Springer. https://doi.org/10.1007/978-3-031-35320-8_38

Komplettanzeige