Detecting hidden backdoors in large language models
Lade...
Autor:in (Körperschaft)
Publikationsdatum
2025
Typ der Arbeit
Studiengang
Typ
04B - Beitrag Konferenzschrift
Herausgeber:innen
Herausgeber:in (Körperschaft)
Betreuer:in
Übergeordnetes Werk
2025 IEEE International Conference on Systems, Man, and Cybernetics (SMC). Proceedings
Themenheft
DOI der Originalpublikation
Link
Reihe / Serie
Reihennummer
Jahrgang / Band
Ausgabe / Nummer
Seiten / Dauer
6101-6104
Patentnummer
Verlag / Herausgebende Institution
IEEE
Verlagsort / Veranstaltungsort
Wien
Auflage
Version
Programmiersprache
Abtretungsempfänger:in
Praxispartner:in/Auftraggeber:in
Zusammenfassung
Large Language Models (LLMs) have revolutionised the field of Natural Language Processing (NLP) and are currently being integrated into more critical domains, raising concerns about the possibility of hidden backdoors that could potentially allow collecting user data or manipulate output. This paper investigates the possibility of hidden backdoors by analysing network traffic during local LLM usage. Two models, DeepSeek-R1 and Mistral, were tested in experiments to have a comparison of LLMs from different geopolitical and regulatory environments. Using Ollama, a software that allows to run LLMs locally, three experiments were performed: 1) Monitoring TCP Connections on a per process level, 2) running the local LLM in a Docker container with full network isolation, and 3) monitoring all network traffic using Wireshark on a monitored Docker bridge. The results showed that there was no external network communication during the experiments. Anomalies due to other means than influence via a hidden backdoor were found such as DeepSeek’s language output, which was in Chinese for certain prompts, even though the prompt was in English. In conclusion, our findings indicate that it is possible to locally isolate LLMs for critical usage, and that Docker-based network isolation could be a practical approach for detecting hidden backdoors in LLMs.
Schlagwörter
Fachgebiet (DDC)
Veranstaltung
2025 IEEE International Conference on Systems, Man, and Cybernetics (SMC)
Startdatum der Ausstellung
Enddatum der Ausstellung
Startdatum der Konferenz
05.10.2025
Enddatum der Konferenz
08.10.2025
Datum der letzten Prüfung
ISBN
979-8-3315-3358-8
979-8-3315-3357-1
979-8-3315-3357-1
ISSN
Sprache
Englisch
Während FHNW Zugehörigkeit erstellt
Ja
Zukunftsfelder FHNW
Publikationsstatus
Veröffentlicht
Begutachtung
Fachlektorat/Editorial Review
Open Access-Status
Closed
Zitation
Peechatt, J. M., Schaaf, M., & Christen, P. (2025). Detecting hidden backdoors in large language models. 2025 IEEE International Conference on Systems, Man, and Cybernetics (SMC). Proceedings, 6101–6104. https://doi.org/10.1109/smc58881.2025.11342801