Novel benchmark data set for automatic error detection and correction

dc.contributor.authorMasanti, Corina
dc.contributor.authorWitschel, Hans Friedrich
dc.contributor.authorRiesen, Kaspar
dc.contributor.editorMétais, Elisabeth
dc.contributor.editorMeziane, Farid
dc.contributor.editorSugumaran, Vijayan
dc.contributor.editorManning, Warren
dc.contributor.editorReiff-Marganiec, Stephan
dc.date.accessioned2026-05-20T11:53:57Z
dc.date.issued2023
dc.description.abstractThe present paper introduces a novel benchmark data set for automatic error detection as well as error correction in text documents based on language models or other techniques. The data set contains a large number of sentences from various domains annotated with various types of errors (orthographic, grammatical, punctuation, and typography errors). The paper presents the method used to collect and annotate the documents, provides statistical analyses of the data set’s properties and evaluates two preliminary baseline models for automatic error detection on a specific benchmark task. The results show, on the one hand, the effectiveness of the proposed data set for the evaluation of automatic error detection systems. On the other hand, these initial analyses also reveal that the data set contains challenging cases that are difficult to detect. Finally, the paper discusses potential applications of the proposed data set in the development and research of error detection and error correction systems.
dc.event28th International Conference on Applications of Natural Language to Information Systems
dc.event.end2023-06-23
dc.event.start2023-06-21
dc.identifier.doi10.1007/978-3-031-35320-8_38
dc.identifier.isbn978-3-031-35319-2
dc.identifier.urihttps://irf.fhnw.ch/handle/11654/56303
dc.language.isoen
dc.publisherSpringer
dc.relation.ispartofNatural Language Processing and Information Systems
dc.relation.ispartofseriesLecture Notes in Computer Science
dc.spatialCham
dc.subject.ddc330 - Wirtschaft
dc.titleNovel benchmark data set for automatic error detection and correction
dc.type04B - Beitrag Konferenzschrift
dspace.entity.typePublication
fhnw.InventedHereYes
fhnw.ReviewTypepeer-reviewed
fhnw.affiliation.hochschuleHochschule für Wirtschaft FHNWde_CH
fhnw.affiliation.institutInstitut für Wirtschaftsinformatikde_CH
fhnw.openAccessCategoryClosed
fhnw.pagination511-521
fhnw.publicationStatePublished
fhnw.seriesNumber13913
relation.isAuthorOfPublication4f94a17c-9d05-433c-882f-68f062e0e6ae
relation.isAuthorOfPublicationd761e073-1612-4d22-8521-65c01c19f97a
relation.isAuthorOfPublication.latestForDiscovery4f94a17c-9d05-433c-882f-68f062e0e6ae
Dateien

Lizenzbündel

Gerade angezeigt 1 - 1 von 1
Lade...
Vorschaubild
Name:
license.txt
Größe:
2.66 KB
Format:
Item-specific license agreed upon to submission
Beschreibung: