Detecting hidden backdoors in large language models

Loading...
Thumbnail Image
Author (Corporation)
Publication date
2025
Type of student thesis
Course of study
Type
04B - Conference paper
Editors
Editor (Corporation)
Supervisor
Parent work
2025 IEEE International Conference on Systems, Man, and Cybernetics (SMC). Proceedings
Special issue
DOI of the original publication
Link
Series
Series number
Volume
Issue / Number
Pages / Duration
6101-6104
Patent number
Publisher / Publishing institution
IEEE
Place of publication / Event location
Wien
Edition
Version
Programming language
Assignee
Practice partner / Client
Abstract
Large Language Models (LLMs) have revolutionised the field of Natural Language Processing (NLP) and are currently being integrated into more critical domains, raising concerns about the possibility of hidden backdoors that could potentially allow collecting user data or manipulate output. This paper investigates the possibility of hidden backdoors by analysing network traffic during local LLM usage. Two models, DeepSeek-R1 and Mistral, were tested in experiments to have a comparison of LLMs from different geopolitical and regulatory environments. Using Ollama, a software that allows to run LLMs locally, three experiments were performed: 1) Monitoring TCP Connections on a per process level, 2) running the local LLM in a Docker container with full network isolation, and 3) monitoring all network traffic using Wireshark on a monitored Docker bridge. The results showed that there was no external network communication during the experiments. Anomalies due to other means than influence via a hidden backdoor were found such as DeepSeek’s language output, which was in Chinese for certain prompts, even though the prompt was in English. In conclusion, our findings indicate that it is possible to locally isolate LLMs for critical usage, and that Docker-based network isolation could be a practical approach for detecting hidden backdoors in LLMs.
Keywords
Project
Event
2025 IEEE International Conference on Systems, Man, and Cybernetics (SMC)
Exhibition start date
Exhibition end date
Conference start date
05.10.2025
Conference end date
08.10.2025
Date of the last check
ISBN
979-8-3315-3358-8
979-8-3315-3357-1
ISSN
Language
English
Created during FHNW affiliation
Yes
Strategic action fields FHNW
Publication status
Published
Review
Expert editing/editorial review
Open access category
Closed
License

Citation
Peechatt, J. M., Schaaf, M., & Christen, P. (2025). Detecting hidden backdoors in large language models. 2025 IEEE International Conference on Systems, Man, and Cybernetics (SMC). Proceedings, 6101–6104. https://doi.org/10.1109/smc58881.2025.11342801