Multilingual text summarization for German texts using transformer models

dc.contributor.authorAlcantara, Tomas Humberto Montiel
dc.contributor.authorKrütli, David
dc.contributor.authorRavada, Revathi
dc.contributor.authorHanne, Thomas
dc.date.accessioned2025-01-24T15:02:36Z
dc.date.issued2023
dc.description.abstractThe tremendous increase in documents available on the Web has turned finding the relevant pieces of information into a challenging, tedious, and time-consuming activity. Text summarization is an important natural language processing (NLP) task used to reduce the reading requirements of text. Automatic text summarization is an NLP task that consists of creating a shorter version of a text document which is coherent and maintains the most relevant information of the original text. In recent years, automatic text summarization has received significant attention, as it can be applied to a wide range of applications such as the extraction of highlights from scientific papers or the generation of summaries of news articles. In this research project, we are focused mainly on abstractive text summarization that extracts the most important contents from a text in a rephrased form. The main purpose of this project is to summarize texts in German. Unfortunately, most pretrained models are only available for English. We therefore focused on the German BERT multilingual model and the BART monolingual model for English, with a consideration of translation possibilities. As the source of the experiment setup, took the German Wikipedia article dataset and compared how well the multilingual model performed for German text summarization when compared to using machine-translated text summaries from monolingual English language models. We used the ROUGE-1 metric to analyze the quality of the text summarization.
dc.identifier.doi10.3390/info14060303
dc.identifier.issn2078-2489
dc.identifier.urihttps://irf.fhnw.ch/handle/11654/48204
dc.identifier.urihttps://doi.org/10.26041/fhnw-10919
dc.issue6
dc.language.isoen
dc.publisherMDPI
dc.relation.ispartofInformation
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.spatialBasel
dc.subject.ddc330 - Wirtschaft
dc.titleMultilingual text summarization for German texts using transformer models
dc.type01A - Beitrag in wissenschaftlicher Zeitschrift
dc.volume14
dspace.entity.typePublication
fhnw.InventedHereYes
fhnw.ReviewTypeAnonymous ex ante peer review of a complete publication
fhnw.affiliation.hochschuleHochschule für Wirtschaft FHNWde_CH
fhnw.affiliation.institutInstitut für Wirtschaftsinformatikde_CH
fhnw.openAccessCategoryGold
fhnw.pagination303
fhnw.publicationStatePublished
relation.isAuthorOfPublication35d8348b-4dae-448a-af2a-4c5a4504da04
relation.isAuthorOfPublication.latestForDiscovery35d8348b-4dae-448a-af2a-4c5a4504da04
Dateien

Originalbündel

Gerade angezeigt 1 - 1 von 1
Vorschaubild
Name:
Multilingual text summarization_2023.pdf
Größe:
506.5 KB
Format:
Adobe Portable Document Format

Lizenzbündel

Gerade angezeigt 1 - 1 von 1
Kein Vorschaubild vorhanden
Name:
license.txt
Größe:
2.66 KB
Format:
Item-specific license agreed upon to submission
Beschreibung: