FLIE: form labeling for information extraction

dc.contributor.authorPustulka, Elzbieta
dc.contributor.authorHanne, Thomas
dc.contributor.authorGachnang, Phillip
dc.contributor.authorBiafora, Pasquale
dc.contributor.editorArai, Kohei
dc.contributor.editorKapoor, Supriya
dc.contributor.editorBhatia, Rahul
dc.date.accessioned2024-04-23T11:52:59Z
dc.date.available2024-04-23T11:52:59Z
dc.date.issued2021
dc.description.abstractInformation extraction (IE) from forms remains an unsolved problem, with some exceptions, like bills. Forms are complex and the templates are often unstable, due to the injection of advertising, extra conditions, or document merging. Our scenario deals with insurance forms used by brokers in Switzerland. Here, each combination of insurer, insurance type and language results in a new document layout, leading to a few hundred document types. To help brokers extract data from policies, we developed a new labeling method, called FLIE (form labeling for information extraction). FLIE first assigns a document to a cluster, grouping by language, insurer, and insurance type. It then labels the layout. To produce training data, the user annotates a sample document by hand, adding attribute names, i.e. provides a mapping. FLIE applies machine learning to propagate the mapping and extracts information. Our results are based on 24 Swiss policies in German: UVG (mandatory accident insurance), KTG (sick pay insurance), and UVGZ (optional accident insurance). Our solution has an accuracy of around 84-89%. It is currently being extended to other policy types and languages.
dc.eventFuture Technologies Conference (FTC) 2020
dc.event.end2020-11-06
dc.event.start2020-11-05
dc.identifier.doihttps://doi.org/10.1007/978-3-030-63089-8_35
dc.identifier.urihttps://irf.fhnw.ch/handle/11654/42798
dc.language.isoen
dc.relation.ispartofProceedings of the Future Technologies Conference (FTC) 2020
dc.relation.ispartofseriesAdvances in Intelligent Systems and Computing
dc.spatialVancouver
dc.subject.ddc330 - Wirtschaft
dc.titleFLIE: form labeling for information extraction
dc.type04B - Beitrag Konferenzschrift
dc.volume2
dspace.entity.typePublication
fhnw.InventedHereYes
fhnw.ReviewTypeAnonymous ex ante peer review of a complete publication
fhnw.affiliation.hochschuleHochschule für Wirtschaft FHNWde_CH
fhnw.affiliation.institutInstitut für Wirtschaftsinformatikde_CH
fhnw.openAccessCategoryClosed
fhnw.pagination550-567
fhnw.publicationStatePublished
fhnw.seriesNumber1289
relation.isAuthorOfPublication3e7f2a0a-692e-4652-b305-7a7e19e011de
relation.isAuthorOfPublication35d8348b-4dae-448a-af2a-4c5a4504da04
relation.isAuthorOfPublication98d19d9f-59f1-47db-a407-a144bb75b2c1
relation.isAuthorOfPublication89760ff6-47d8-419d-aad6-d458929e3a16
relation.isAuthorOfPublication.latestForDiscovery3e7f2a0a-692e-4652-b305-7a7e19e011de
Dateien

Lizenzbündel

Gerade angezeigt 1 - 1 von 1
Kein Vorschaubild vorhanden
Name:
license.txt
Größe:
1.36 KB
Format:
Item-specific license agreed upon to submission
Beschreibung: