Detecting hidden backdoors in large language models

dc.contributor.authorPeechatt, Jibin Mathew
dc.contributor.authorSchaaf, Marc
dc.contributor.authorChristen, Patrik
dc.date.accessioned2026-02-17T11:14:13Z
dc.date.issued2025
dc.description.abstractLarge Language Models (LLMs) have revolutionised the field of Natural Language Processing (NLP) and are currently being integrated into more critical domains, raising concerns about the possibility of hidden backdoors that could potentially allow collecting user data or manipulate output. This paper investigates the possibility of hidden backdoors by analysing network traffic during local LLM usage. Two models, DeepSeek-R1 and Mistral, were tested in experiments to have a comparison of LLMs from different geopolitical and regulatory environments. Using Ollama, a software that allows to run LLMs locally, three experiments were performed: 1) Monitoring TCP Connections on a per process level, 2) running the local LLM in a Docker container with full network isolation, and 3) monitoring all network traffic using Wireshark on a monitored Docker bridge. The results showed that there was no external network communication during the experiments. Anomalies due to other means than influence via a hidden backdoor were found such as DeepSeek’s language output, which was in Chinese for certain prompts, even though the prompt was in English. In conclusion, our findings indicate that it is possible to locally isolate LLMs for critical usage, and that Docker-based network isolation could be a practical approach for detecting hidden backdoors in LLMs.
dc.event2025 IEEE International Conference on Systems, Man, and Cybernetics (SMC)
dc.event.end2025-10-08
dc.event.start2025-10-05
dc.identifier.doi10.1109/smc58881.2025.11342801
dc.identifier.isbn979-8-3315-3358-8
dc.identifier.isbn979-8-3315-3357-1
dc.identifier.urihttps://irf.fhnw.ch/handle/11654/55489
dc.language.isoen
dc.publisherIEEE
dc.relation.ispartof2025 IEEE International Conference on Systems, Man, and Cybernetics (SMC). Proceedings
dc.rights.uri
dc.rights.uri
dc.spatialWien
dc.subject.ddc005 - Computer Programmierung, Programme und Daten
dc.titleDetecting hidden backdoors in large language models
dc.type04B - Beitrag Konferenzschrift
dspace.entity.typePublication
fhnw.InventedHereYes
fhnw.ReviewTypeLectoring (ex ante)
fhnw.affiliation.hochschuleHochschule für Wirtschaft FHNWde_CH
fhnw.affiliation.institutInstitut für Wirtschaftsinformatikde_CH
fhnw.openAccessCategoryClosed
fhnw.pagination6101-6104
fhnw.publicationStatePublished
fhnw.targetcollectiond40e4c67-dd87-4d14-8518-b2f0a855e750
relation.isAuthorOfPublication2003564b-a7a0-497d-87c7-505cd57d6109
relation.isAuthorOfPublication66e116ee-b442-4683-b6c2-781999c6cc84
relation.isAuthorOfPublicationd6fa5f05-5123-4d2f-8e74-79adfe54acc7
relation.isAuthorOfPublication.latestForDiscovery2003564b-a7a0-497d-87c7-505cd57d6109
Dateien

Lizenzbündel

Gerade angezeigt 1 - 1 von 1
Lade...
Vorschaubild
Name:
license.txt
Größe:
2.66 KB
Format:
Item-specific license agreed upon to submission
Beschreibung: