Multilingual text summarization for German texts using transformer models

Typ
01A - Beitrag in wissenschaftlicher Zeitschrift
Herausgeber:innen
Herausgeber:in (Körperschaft)
Betreuer:in
Übergeordnetes Werk
Information
Themenheft
DOI der Originalpublikation
Link
Reihe / Serie
Reihennummer
Jahrgang / Band
14
Ausgabe / Nummer
6
Seiten / Dauer
303
Patentnummer
Verlag / Herausgebende Institution
MDPI
Verlagsort / Veranstaltungsort
Basel
Auflage
Version
Programmiersprache
Abtretungsempfänger:in
Praxispartner:in/Auftraggeber:in
Zusammenfassung
The tremendous increase in documents available on the Web has turned finding the relevant pieces of information into a challenging, tedious, and time-consuming activity. Text summarization is an important natural language processing (NLP) task used to reduce the reading requirements of text. Automatic text summarization is an NLP task that consists of creating a shorter version of a text document which is coherent and maintains the most relevant information of the original text. In recent years, automatic text summarization has received significant attention, as it can be applied to a wide range of applications such as the extraction of highlights from scientific papers or the generation of summaries of news articles. In this research project, we are focused mainly on abstractive text summarization that extracts the most important contents from a text in a rephrased form. The main purpose of this project is to summarize texts in German. Unfortunately, most pretrained models are only available for English. We therefore focused on the German BERT multilingual model and the BART monolingual model for English, with a consideration of translation possibilities. As the source of the experiment setup, took the German Wikipedia article dataset and compared how well the multilingual model performed for German text summarization when compared to using machine-translated text summaries from monolingual English language models. We used the ROUGE-1 metric to analyze the quality of the text summarization.
Schlagwörter
Fachgebiet (DDC)
330 - Wirtschaft
Projekt
Veranstaltung
Startdatum der Ausstellung
Enddatum der Ausstellung
Startdatum der Konferenz
Enddatum der Konferenz
Datum der letzten Prüfung
ISBN
ISSN
2078-2489
Sprache
Englisch
Während FHNW Zugehörigkeit erstellt
Ja
Zukunftsfelder FHNW
Publikationsstatus
Veröffentlicht
Begutachtung
Peer-Review der ganzen Publikation
Open Access-Status
Gold
Lizenz
'https://creativecommons.org/licenses/by/4.0/'
Zitation
ALCANTARA, Tomas Humberto Montiel, David KRÜTLI, Revathi RAVADA und Thomas HANNE, 2023. Multilingual text summarization for German texts using transformer models. Information. 2023. Bd. 14, Nr. 6, S. 303. DOI 10.3390/info14060303. Verfügbar unter: https://doi.org/10.26041/fhnw-10919