Novel benchmark data set for automatic error detection and correction

Masanti, Corina; Witschel, Hans Friedrich; Riesen, Kaspar

Novel benchmark data set for automatic error detection and correction

dc.contributor.author	Masanti, Corina
dc.contributor.author	Witschel, Hans Friedrich
dc.contributor.author	Riesen, Kaspar
dc.contributor.editor	Métais, Elisabeth
dc.contributor.editor	Meziane, Farid
dc.contributor.editor	Sugumaran, Vijayan
dc.contributor.editor	Manning, Warren
dc.contributor.editor	Reiff-Marganiec, Stephan
dc.date.accessioned	2026-05-20T11:53:57Z
dc.date.issued	2023
dc.description.abstract	The present paper introduces a novel benchmark data set for automatic error detection as well as error correction in text documents based on language models or other techniques. The data set contains a large number of sentences from various domains annotated with various types of errors (orthographic, grammatical, punctuation, and typography errors). The paper presents the method used to collect and annotate the documents, provides statistical analyses of the data set’s properties and evaluates two preliminary baseline models for automatic error detection on a specific benchmark task. The results show, on the one hand, the effectiveness of the proposed data set for the evaluation of automatic error detection systems. On the other hand, these initial analyses also reveal that the data set contains challenging cases that are difficult to detect. Finally, the paper discusses potential applications of the proposed data set in the development and research of error detection and error correction systems.
dc.event	28th International Conference on Applications of Natural Language to Information Systems
dc.event.end	2023-06-23
dc.event.start	2023-06-21
dc.identifier.doi	10.1007/978-3-031-35320-8_38
dc.identifier.isbn	978-3-031-35319-2
dc.identifier.uri	https://irf.fhnw.ch/handle/11654/56303
dc.language.iso	en
dc.publisher	Springer
dc.relation.ispartof	Natural Language Processing and Information Systems
dc.relation.ispartofseries	Lecture Notes in Computer Science
dc.spatial	Cham
dc.subject.ddc	330 - Wirtschaft
dc.title	Novel benchmark data set for automatic error detection and correction
dc.type	04B - Beitrag Konferenzschrift
dspace.entity.type	Publication
fhnw.InventedHere	Yes
fhnw.ReviewType	peer-reviewed
fhnw.affiliation.hochschule	Hochschule für Wirtschaft FHNW	de_CH
fhnw.affiliation.institut	Institut für Wirtschaftsinformatik	de_CH
fhnw.openAccessCategory	Closed
fhnw.pagination	511-521
fhnw.publicationState	Published
fhnw.seriesNumber	13913
relation.isAuthorOfPublication	4f94a17c-9d05-433c-882f-68f062e0e6ae
relation.isAuthorOfPublication	d761e073-1612-4d22-8521-65c01c19f97a
relation.isAuthorOfPublication.latestForDiscovery	4f94a17c-9d05-433c-882f-68f062e0e6ae

Dateien

Lizenzbündel

Gerade angezeigt 1 - 1 von 1

Name:: license.txt
Größe:: 2.66 KB
Format:: Item-specific license agreed upon to submission
Beschreibung:

Herunterladen

Sammlung

Institut für Wirtschaftsinformatik