Detecting hidden backdoors in large language models

Peechatt, Jibin Mathew; Schaaf, Marc; Christen, Patrik

Detecting hidden backdoors in large language models

Autor:innen

Peechatt, Jibin Mathew

Schaaf, Marc

Christen, Patrik

Autor:in (Körperschaft)

Publikationsdatum

2025

Typ der Arbeit

Studiengang

Sammlung

Institut für Wirtschaftsinformatik

Komplettanzeige

Typ

04B - Beitrag Konferenzschrift

Herausgeber:innen

Herausgeber:in (Körperschaft)

Betreuer:in

Übergeordnetes Werk

2025 IEEE International Conference on Systems, Man, and Cybernetics (SMC). Proceedings

Themenheft

DOI der Originalpublikation

https://doi.org/10.1109/smc58881.2025.11342801

URI

https://irf.fhnw.ch/handle/11654/55489

Link

Zugehörige Forschungsdaten

Reihe / Serie

Reihennummer

Jahrgang / Band

Ausgabe / Nummer

Seiten / Dauer

6101-6104

Patentnummer

Verlag / Herausgebende Institution

IEEE

Verlagsort / Veranstaltungsort

Wien

Auflage

Version

Programmiersprache

Abtretungsempfänger:in

Praxispartner:in/Auftraggeber:in

Zusammenfassung

Large Language Models (LLMs) have revolutionised the field of Natural Language Processing (NLP) and are currently being integrated into more critical domains, raising concerns about the possibility of hidden backdoors that could potentially allow collecting user data or manipulate output. This paper investigates the possibility of hidden backdoors by analysing network traffic during local LLM usage. Two models, DeepSeek-R1 and Mistral, were tested in experiments to have a comparison of LLMs from different geopolitical and regulatory environments. Using Ollama, a software that allows to run LLMs locally, three experiments were performed: 1) Monitoring TCP Connections on a per process level, 2) running the local LLM in a Docker container with full network isolation, and 3) monitoring all network traffic using Wireshark on a monitored Docker bridge. The results showed that there was no external network communication during the experiments. Anomalies due to other means than influence via a hidden backdoor were found such as DeepSeek’s language output, which was in Chinese for certain prompts, even though the prompt was in English. In conclusion, our findings indicate that it is possible to locally isolate LLMs for critical usage, and that Docker-based network isolation could be a practical approach for detecting hidden backdoors in LLMs.

Schlagwörter

Fachgebiet (DDC)

005 - Computer Programmierung, Programme und Daten

Projekt

Veranstaltung

2025 IEEE International Conference on Systems, Man, and Cybernetics (SMC)

Startdatum der Ausstellung

Enddatum der Ausstellung

Startdatum der Konferenz

05.10.2025

Enddatum der Konferenz

08.10.2025

Datum der letzten Prüfung

ISBN

979-8-3315-3358-8
979-8-3315-3357-1

ISSN

Sprache

Englisch

Während FHNW Zugehörigkeit erstellt

Ja

Zukunftsfelder FHNW

Publikationsstatus

Veröffentlicht

Begutachtung

nicht peer-reviewed

Open Access-Status

Closed

Lizenz

Zitation

Peechatt, J. M., Schaaf, M., & Christen, P. (2025). Detecting hidden backdoors in large language models. 2025 IEEE International Conference on Systems, Man, and Cybernetics (SMC). Proceedings, 6101–6104. https://doi.org/10.1109/smc58881.2025.11342801

Komplettanzeige