Context: LLM + RAG in real-world applications
In recent years, systems based on Large Language Model (LLM) enriched with mechanisms of Retrieval-Augmented Generation (RAG) have become a key element of many real-world applications: virtual assistants, decision support systems, and, more generally, agentic architectures capable of interacting with dynamic knowledge bases. However, as also highlighted by recent scientific literature, the introduction of RAG significantly expands the attack surface of these systems.
Case study: AGENTPOISON (NeurIPS 2024)
Article "AGENTPOISON: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases", presented at the conference NeurIPS-2024 (https://neurips.cc/Conferences/2024), highlights a particularly critical issue: the possibility of poisoning (poisoning) the data of retrieval to induce malicious behavior or controlled deviations in agents LLMIn these scenarios, an attacker does not act directly on the model, but exploits the agent's dependence on the retrieved information, introducing content designed to activate only in the presence of specific semantic triggersThis makes the attack difficult to detect with superficial or simple checks. keyword.
Linkalab Activities: Automatic Security Assessment Framework
The research department of Linkalab is studying these issues in depth with the aim of developing a automatic security assessment framework for services based on LLM and (also but not only) on RAGThe underlying idea is to provide systematic tools for analyzing the critical issues of an agent system before its deployment in sensitive operational contexts, overcoming manual or purely reactive approaches.
Proposal: Optimizing the attack trigger
Inspired by the work developed for Agent Poison, researchers from the research department are initially working on an innovative proposal focused on theAttack trigger optimization. In particular, the goal is to maximize the distance, in the semantic space of definition, among the data RAG non-poisoned and poisoned ones.
The images shown below, relating to a two-dimensional projection (PCA) with respect to the two principal components in the vector space of embedding, show how non-optimal triggers fail to separate poisoned data from unpoisoned data (left image), while optimal triggers significantly separate these two data sets (right image). A strong semantic separation allows for a more controlled study of the system's behavior and an assessment of how vulnerable an agent is to seemingly innocuous but semantically targeted inputs.
Images (PCA)
- First image: Suboptimal triggers do not allow separating poisoned data from non-poisoned data
- Second image: optimal triggers significantly separate these two data sets


Heuristic approach: domain-independent candidate triggers
The idea that the laboratory is developing is based on a heuristic approach which uses a set of candidate trigger phrases, designed to be as independent of the application domainThese sentences are compared with each other in terms of their ability to semantically separate the “clean” data from the poisoned data, progressively identifying the trigger that best satisfies the requirement of maximum separation. Such an approach, if automated, can become a powerful tool for stress test for systems LLM+RAG.
Conclusion: safety as an ex ante requirement
In a context where intelligent agents are increasingly integrated into critical processes, security cannot be an afterthought ex postUnderstanding, measuring and anticipating these vulnerabilities is a critical step towards the responsible and trusted adoption of cybersecurity-based technologies. LLM.