🛡️ LLM Security Attack Database
Your comprehensive guide to Large Language Model vulnerabilities and defense strategies
LLM Attacks - Wiki
A comprehensive collection of attack vectors and security vulnerabilities targeting Large Language Models (LLMs) and associated AI systems.
Attack Database
SN | Attack | Description |
---|---|---|
1 | Agentic Multi-Agent Exploitation | Exploiting inter-agent trust boundaries so that a malicious payload, initially rejected by one LLM agent, is processed if delivered via another trusted agent, including privilege escalation and cross-agent command execution. |
2 | RAG/Embedding Backdoor Attacks | Attacking LLMs with manipulated embedded documents retrieved during RAG, including poisoning vector DBs to force undesirable completions or disclosures. |
3 | System Prompt Leakage & Reverse Engineering | Forcing disclosure or deducing proprietary system prompts to subvert guardrails and expose internal instructions. |
4 | LLM Tooling/Plugin Supply Chain Attacks | Compromising the ecosystem via malicious plugins, infected models from public repos, or tainted integrations. |
5 | Excessive Agency/Autonomy Attacks | Exploiting/abusing LLM agent autonomy to perform unintended actions, escalate privileges, or cause persistent automated damage in agentic workflows. |
6 | Unbounded Resource Consumption (“Denial of Wallet”) | Manipulating LLM behavior to consume excessive external/cloud resources, raising costs or disrupting operations. |
7 | Cross-Context Federation Leaks | Leveraging federated information contexts or cross-source retrievals to exfiltrate data by manipulating the model’s knowledge context. |
8 | Vector Database Poisoning | Polluting indexing/embedding layers to disrupt or manipulate downstream LLM generations or leak/hallucinate info. |
9 | Adversarial Examples | Crafty manipulations of input data that trick models into making incorrect predictions, potentially leading to harmful decisions. |
10 | Data Poisoning | Malicious data injections into the training set that corrupt the model’s performance, causing biased or incorrect behavior. |
11 | Model Inversion Attacks | Inferring the input values used to train the model, exposing sensitive information. |
12 | Membership Inference Attacks | Determining whether specific data points were part of the model’s training set, leading to privacy breaches. |
13 | Query Manipulation Attacks | Crafting malicious queries that cause the model to reveal unintended information or behave undesirably. |
14 | Model Extraction Attacks | Reverse-engineering the model by querying it to construct a copy, resulting in intellectual property theft. |
15 | Transfer Learning Attacks | Exploiting vulnerabilities in the transfer learning process to manipulate model performance on new tasks. |
16 | Federated Learning Attacks | Compromising client devices or server-side data in federated learning setups to corrupt the global model or extract sensitive information. |
17 | Edge AI Attacks | Targeting edge devices running AI models to exfiltrate data or manipulate behavior. |
18 | IoT AI Attacks | Attacking IoT devices using AI, potentially leading to data breaches or unauthorized control. |
19 | Prompt Injection Attacks | Manipulating input prompts in conversational AI to bypass safety measures or extract confidential information. |
20 | Indirect Prompt Injection | Exploiting vulnerabilities in systems integrating LLMs to inject malicious prompts indirectly. |
21 | Model Fairness Attacks | Intentionally biasing the model by manipulating input data, affecting fairness and equity. |
22 | Model Explainability Attacks | Designing inputs that make model decisions difficult to interpret, hindering transparency. |
23 | Robustness Attacks | Testing the model’s resilience by subjecting it to various perturbations to find weaknesses. |
24 | Security Attacks | Compromising the confidentiality, integrity, or availability of the model and its outputs. |
25 | Integrity Attacks | Tampering with the model’s architecture, weights, or biases to alter behavior without authorization. |
26 | Jailbreaking Attacks | Attempting to circumvent the ethical constraints or content filters in an LLM. |
27 | Training Data Extraction | Inferring specific data used to train the model through carefully crafted queries. |
28 | Synthetic Data Generation Attacks | Creating synthetic data designed to mislead or degrade AI model performance. |
29 | Model Stealing from Cloud | Extracting a trained model from a cloud service without direct access. |
30 | Model Poisoning from Edge | Introducing malicious data at edge devices to corrupt model behavior. |
31 | Model Drift Detection Evasion | Evading mechanisms that detect when a model’s performance degrades over time. |
32 | Adversarial Example Generation with Deep Learning | Using advanced techniques to create adversarial examples that deceive the model. |
33 | Model Reprogramming | Repurposing a model for a different task, potentially bypassing security measures. |
34 | Thermal Side-Channel Attacks | Using temperature variations in hardware during model inference to infer sensitive information. |
35 | Transfer Learning Attacks from Pre-Trained Models | Poisoning pre-trained models to influence performance when transferred to new tasks. |
36 | Model Fairness and Bias Detection Evasion | Designing attacks to evade detection mechanisms monitoring fairness and bias. |
37 | Model Explainability Attack | Attacking the model’s interpretability to prevent users from understanding its decision-making process. |
38 | Deepfake Attacks | Creating realistic fake audio or video content to manipulate events or conversations. |
39 | Cloud-Based Model Replication | Replicating trained models in the cloud to develop competing products or gain unauthorized insights. |
40 | Confidentiality Attacks | Extracting sensitive or proprietary information embedded within the model’s parameters. |
41 | Quantum Attacks on LLMs | Using quantum computing to theoretically compromise the security of LLMs or their cryptographic protections. |
42 | Model Stealing from Cloud with Pre-Trained Models | Extracting pre-trained models from the cloud without direct access. |
43 | Transfer Learning Attacks with Edge Devices | Compromising knowledge transferred to edge devices. |
44 | Adversarial Example Generation with Model Inversion | Creating adversarial examples using model inversion techniques. |
45 | Backdoor Attacks | Embedding hidden behaviors within the model triggered by specific inputs. |
46 | Watermarking Attacks | Removing or altering watermarks protecting intellectual property in AI models. |
47 | Neural Network Trojans | Embedding malicious functionalities within the model triggered under certain conditions. |
48 | Model Black-Box Attacks | Exploiting the model using input-output queries without internal knowledge. |
49 | Model Update Attacks | Manipulating the model during its update process to introduce vulnerabilities. |
50 | Gradient Inversion Attacks | Reconstructing training data by exploiting gradients in federated learning. |
51 | Side-Channel Timing Attacks | Inferring model parameters or training data by measuring computation times during inference. |
About This Project
Contribute if you come across any new vulnerabilities that are not on this list.
License
This project is licensed under the GPL-3.0 License - see the LICENSE file for details.