FLIE: form labeling for information extraction
Loading...
Author (Corporation)
Publication date
2021
Typ of student thesis
Course of study
Collections
Type
04B - Conference paper
Editors
Editor (Corporation)
Supervisor
Parent work
Proceedings of the Future Technologies Conference (FTC) 2020
Special issue
DOI of the original publication
Link
Series
Advances in Intelligent Systems and Computing
Series number
1289
Volume
2
Issue / Number
Pages / Duration
550-567
Patent number
Publisher / Publishing institution
Place of publication / Event location
Vancouver
Edition
Version
Programming language
Assignee
Practice partner / Client
Abstract
Information extraction (IE) from forms remains an unsolved problem, with some exceptions, like bills. Forms are complex and the templates are often unstable, due to the injection of advertising, extra conditions, or document merging. Our scenario deals with insurance forms used by brokers in Switzerland. Here, each combination of insurer, insurance type and language results in a new document layout, leading to a few hundred document types. To help brokers extract data from policies, we developed a new labeling method, called FLIE (form labeling for information extraction). FLIE first assigns a document to a cluster, grouping by language, insurer, and insurance type. It then labels the layout. To produce training data, the user annotates a sample document by hand, adding attribute names, i.e. provides a mapping. FLIE applies machine learning to propagate the mapping and extracts information. Our results are based on 24 Swiss policies in German: UVG (mandatory accident insurance), KTG (sick pay insurance), and UVGZ (optional accident insurance). Our solution has an accuracy of around 84-89%. It is currently being extended to other policy types and languages.
Keywords
Subject (DDC)
Event
Future Technologies Conference (FTC) 2020
Exhibition start date
Exhibition end date
Conference start date
05.11.2020
Conference end date
06.11.2020
Date of the last check
ISBN
ISSN
Language
English
Created during FHNW affiliation
Yes
Strategic action fields FHNW
Publication status
Published
Review
Peer review of the complete publication
Open access category
Closed
License
Citation
Pustulka, E., Hanne, T., Gachnang, P., & Biafora, P. (2021). FLIE: form labeling for information extraction. In K. Arai, S. Kapoor, & R. Bhatia (Eds.), Proceedings of the Future Technologies Conference (FTC) 2020 (Vol. 2, pp. 550–567). https://doi.org/10.1007/978-3-030-63089-8_35