Detecting hidden backdoors in large language models

Peechatt, Jibin Mathew; Schaaf, Marc; Christen, Patrik

Detecting hidden backdoors in large language models

dc.contributor.author	Peechatt, Jibin Mathew
dc.contributor.author	Schaaf, Marc
dc.contributor.author	Christen, Patrik
dc.date.accessioned	2026-02-17T11:14:13Z
dc.date.issued	2025
dc.description.abstract	Large Language Models (LLMs) have revolutionised the field of Natural Language Processing (NLP) and are currently being integrated into more critical domains, raising concerns about the possibility of hidden backdoors that could potentially allow collecting user data or manipulate output. This paper investigates the possibility of hidden backdoors by analysing network traffic during local LLM usage. Two models, DeepSeek-R1 and Mistral, were tested in experiments to have a comparison of LLMs from different geopolitical and regulatory environments. Using Ollama, a software that allows to run LLMs locally, three experiments were performed: 1) Monitoring TCP Connections on a per process level, 2) running the local LLM in a Docker container with full network isolation, and 3) monitoring all network traffic using Wireshark on a monitored Docker bridge. The results showed that there was no external network communication during the experiments. Anomalies due to other means than influence via a hidden backdoor were found such as DeepSeek’s language output, which was in Chinese for certain prompts, even though the prompt was in English. In conclusion, our findings indicate that it is possible to locally isolate LLMs for critical usage, and that Docker-based network isolation could be a practical approach for detecting hidden backdoors in LLMs.
dc.event	2025 IEEE International Conference on Systems, Man, and Cybernetics (SMC)
dc.event.end	2025-10-08
dc.event.start	2025-10-05
dc.identifier.doi	10.1109/smc58881.2025.11342801
dc.identifier.isbn	979-8-3315-3358-8
dc.identifier.isbn	979-8-3315-3357-1
dc.identifier.uri	https://irf.fhnw.ch/handle/11654/55489
dc.language.iso	en
dc.publisher	IEEE
dc.relation.ispartof	2025 IEEE International Conference on Systems, Man, and Cybernetics (SMC). Proceedings
dc.rights.uri
dc.rights.uri
dc.spatial	Wien
dc.subject.ddc	005 - Computer Programmierung, Programme und Daten
dc.title	Detecting hidden backdoors in large language models
dc.type	04B - Beitrag Konferenzschrift
dspace.entity.type	Publication
fhnw.InventedHere	Yes
fhnw.ReviewType	Lectoring (ex ante)
fhnw.affiliation.hochschule	Hochschule für Wirtschaft FHNW	de_CH
fhnw.affiliation.institut	Institut für Wirtschaftsinformatik	de_CH
fhnw.openAccessCategory	Closed
fhnw.pagination	6101-6104
fhnw.publicationState	Published
fhnw.targetcollection	d40e4c67-dd87-4d14-8518-b2f0a855e750
relation.isAuthorOfPublication	2003564b-a7a0-497d-87c7-505cd57d6109
relation.isAuthorOfPublication	66e116ee-b442-4683-b6c2-781999c6cc84
relation.isAuthorOfPublication	d6fa5f05-5123-4d2f-8e74-79adfe54acc7
relation.isAuthorOfPublication.latestForDiscovery	2003564b-a7a0-497d-87c7-505cd57d6109

Dateien

Lizenzbündel

Gerade angezeigt 1 - 1 von 1

Name:: license.txt
Größe:: 2.66 KB
Format:: Item-specific license agreed upon to submission
Beschreibung:

Herunterladen

Sammlung

Institut für Wirtschaftsinformatik