Multilingual text summarization for German texts using transformer models

Type
01A - Journal article
Editors
Editor (Corporation)
Supervisor
Parent work
Information
Special issue
DOI of the original publication
Link
Series
Series number
Volume
14
Issue / Number
6
Pages / Duration
303
Patent number
Publisher / Publishing institution
MDPI
Place of publication / Event location
Basel
Edition
Version
Programming language
Assignee
Practice partner / Client
Abstract
The tremendous increase in documents available on the Web has turned finding the relevant pieces of information into a challenging, tedious, and time-consuming activity. Text summarization is an important natural language processing (NLP) task used to reduce the reading requirements of text. Automatic text summarization is an NLP task that consists of creating a shorter version of a text document which is coherent and maintains the most relevant information of the original text. In recent years, automatic text summarization has received significant attention, as it can be applied to a wide range of applications such as the extraction of highlights from scientific papers or the generation of summaries of news articles. In this research project, we are focused mainly on abstractive text summarization that extracts the most important contents from a text in a rephrased form. The main purpose of this project is to summarize texts in German. Unfortunately, most pretrained models are only available for English. We therefore focused on the German BERT multilingual model and the BART monolingual model for English, with a consideration of translation possibilities. As the source of the experiment setup, took the German Wikipedia article dataset and compared how well the multilingual model performed for German text summarization when compared to using machine-translated text summaries from monolingual English language models. We used the ROUGE-1 metric to analyze the quality of the text summarization.
Keywords
Subject (DDC)
Project
Event
Exhibition start date
Exhibition end date
Conference start date
Conference end date
Date of the last check
ISBN
ISSN
2078-2489
Language
English
Created during FHNW affiliation
Yes
Strategic action fields FHNW
Publication status
Published
Review
Peer review of the complete publication
Open access category
Gold
License
'https://creativecommons.org/licenses/by/4.0/'
Citation
Alcantara, T. H. M., Krütli, D., Ravada, R., & Hanne, T. (2023). Multilingual text summarization for German texts using transformer models. Information, 14(6), 303. https://doi.org/10.3390/info14060303