Novel benchmark data set for automatic error detection and correction
Loading...
Author (Corporation)
Publication date
2023
Type of student thesis
Course of study
Collections
Type
04B - Conference paper
Editor (Corporation)
Supervisor
Parent work
Natural Language Processing and Information Systems
Special issue
DOI of the original publication
Link
Related research data
Series
Lecture Notes in Computer Science
Series number
13913
Volume
Issue / Number
Pages / Duration
511-521
Patent number
Publisher / Publishing institution
Springer
Place of publication / Event location
Cham
Edition
Version
Programming language
Assignee
Practice partner / Client
Abstract
The present paper introduces a novel benchmark data set for automatic error detection as well as error correction in text documents based on language models or other techniques. The data set contains a large number of sentences from various domains annotated with various types of errors (orthographic, grammatical, punctuation, and typography errors). The paper presents the method used to collect and annotate the documents, provides statistical analyses of the data set’s properties and evaluates two preliminary baseline models for automatic error detection on a specific benchmark task. The results show, on the one hand, the effectiveness of the proposed data set for the evaluation of automatic error detection systems. On the other hand, these initial analyses also reveal that the data set contains challenging cases that are difficult to detect. Finally, the paper discusses potential applications of the proposed data set in the development and research of error detection and error correction systems.
Keywords
Subject (DDC)
Event
28th International Conference on Applications of Natural Language to Information Systems
Exhibition start date
Exhibition end date
Conference start date
21.06.2023
Conference end date
23.06.2023
Date of the last check
ISBN
978-3-031-35319-2
ISSN
Language
English
Created during FHNW affiliation
Yes
Strategic action fields FHNW
Publication status
Published
Review
peer-reviewed
Open access category
Closed
License
Citation
Masanti, C., Witschel, H. F., & Riesen, K. (2023). Novel benchmark data set for automatic error detection and correction. In E. Métais, F. Meziane, V. Sugumaran, W. Manning, & S. Reiff-Marganiec (Eds.), Natural Language Processing and Information Systems (pp. 511–521). Springer. https://doi.org/10.1007/978-3-031-35320-8_38