🛡️ LLM Security Attack Database

Your comprehensive guide to Large Language Model vulnerabilities and defense strategies

LLM Attacks - Wiki

A comprehensive collection of attack vectors and security vulnerabilities targeting Large Language Models (LLMs) and associated AI systems.

Attack Database

SN	Attack	Description
1	Agentic Multi-Agent Exploitation	Exploiting inter-agent trust boundaries so that a malicious payload, initially rejected by one LLM agent, is processed if delivered via another trusted agent, including privilege escalation and cross-agent command execution.
2	RAG/Embedding Backdoor Attacks	Attacking LLMs with manipulated embedded documents retrieved during RAG, including poisoning vector DBs to force undesirable completions or disclosures.
3	System Prompt Leakage & Reverse Engineering	Forcing disclosure or deducing proprietary system prompts to subvert guardrails and expose internal instructions.
4	LLM Tooling/Plugin Supply Chain Attacks	Compromising the ecosystem via malicious plugins, infected models from public repos, or tainted integrations.
5	Excessive Agency/Autonomy Attacks	Exploiting/abusing LLM agent autonomy to perform unintended actions, escalate privileges, or cause persistent automated damage in agentic workflows.
6	Unbounded Resource Consumption (“Denial of Wallet”)	Manipulating LLM behavior to consume excessive external/cloud resources, raising costs or disrupting operations.
7	Cross-Context Federation Leaks	Leveraging federated information contexts or cross-source retrievals to exfiltrate data by manipulating the model’s knowledge context.
8	Vector Database Poisoning	Polluting indexing/embedding layers to disrupt or manipulate downstream LLM generations or leak/hallucinate info.
9	Adversarial Examples	Crafty manipulations of input data that trick models into making incorrect predictions, potentially leading to harmful decisions.
10	Data Poisoning	Malicious data injections into the training set that corrupt the model’s performance, causing biased or incorrect behavior.
11	Model Inversion Attacks	Inferring the input values used to train the model, exposing sensitive information.
12	Membership Inference Attacks	Determining whether specific data points were part of the model’s training set, leading to privacy breaches.
13	Query Manipulation Attacks	Crafting malicious queries that cause the model to reveal unintended information or behave undesirably.
14	Model Extraction Attacks	Reverse-engineering the model by querying it to construct a copy, resulting in intellectual property theft.
15	Transfer Learning Attacks	Exploiting vulnerabilities in the transfer learning process to manipulate model performance on new tasks.
16	Federated Learning Attacks	Compromising client devices or server-side data in federated learning setups to corrupt the global model or extract sensitive information.
17	Edge AI Attacks	Targeting edge devices running AI models to exfiltrate data or manipulate behavior.
18	IoT AI Attacks	Attacking IoT devices using AI, potentially leading to data breaches or unauthorized control.
19	Prompt Injection Attacks	Manipulating input prompts in conversational AI to bypass safety measures or extract confidential information.
20	Indirect Prompt Injection	Exploiting vulnerabilities in systems integrating LLMs to inject malicious prompts indirectly.
21	Model Fairness Attacks	Intentionally biasing the model by manipulating input data, affecting fairness and equity.
22	Model Explainability Attacks	Designing inputs that make model decisions difficult to interpret, hindering transparency.
23	Robustness Attacks	Testing the model’s resilience by subjecting it to various perturbations to find weaknesses.
24	Security Attacks	Compromising the confidentiality, integrity, or availability of the model and its outputs.
25	Integrity Attacks	Tampering with the model’s architecture, weights, or biases to alter behavior without authorization.
26	Jailbreaking Attacks	Attempting to circumvent the ethical constraints or content filters in an LLM.
27	Training Data Extraction	Inferring specific data used to train the model through carefully crafted queries.
28	Synthetic Data Generation Attacks	Creating synthetic data designed to mislead or degrade AI model performance.
29	Model Stealing from Cloud	Extracting a trained model from a cloud service without direct access.
30	Model Poisoning from Edge	Introducing malicious data at edge devices to corrupt model behavior.
31	Model Drift Detection Evasion	Evading mechanisms that detect when a model’s performance degrades over time.
32	Adversarial Example Generation with Deep Learning	Using advanced techniques to create adversarial examples that deceive the model.
33	Model Reprogramming	Repurposing a model for a different task, potentially bypassing security measures.
34	Thermal Side-Channel Attacks	Using temperature variations in hardware during model inference to infer sensitive information.
35	Transfer Learning Attacks from Pre-Trained Models	Poisoning pre-trained models to influence performance when transferred to new tasks.
36	Model Fairness and Bias Detection Evasion	Designing attacks to evade detection mechanisms monitoring fairness and bias.
37	Model Explainability Attack	Attacking the model’s interpretability to prevent users from understanding its decision-making process.
38	Deepfake Attacks	Creating realistic fake audio or video content to manipulate events or conversations.
39	Cloud-Based Model Replication	Replicating trained models in the cloud to develop competing products or gain unauthorized insights.
40	Confidentiality Attacks	Extracting sensitive or proprietary information embedded within the model’s parameters.
41	Quantum Attacks on LLMs	Using quantum computing to theoretically compromise the security of LLMs or their cryptographic protections.
42	Model Stealing from Cloud with Pre-Trained Models	Extracting pre-trained models from the cloud without direct access.
43	Transfer Learning Attacks with Edge Devices	Compromising knowledge transferred to edge devices.
44	Adversarial Example Generation with Model Inversion	Creating adversarial examples using model inversion techniques.
45	Backdoor Attacks	Embedding hidden behaviors within the model triggered by specific inputs.
46	Watermarking Attacks	Removing or altering watermarks protecting intellectual property in AI models.
47	Neural Network Trojans	Embedding malicious functionalities within the model triggered under certain conditions.
48	Model Black-Box Attacks	Exploiting the model using input-output queries without internal knowledge.
49	Model Update Attacks	Manipulating the model during its update process to introduce vulnerabilities.
50	Gradient Inversion Attacks	Reconstructing training data by exploiting gradients in federated learning.
51	Side-Channel Timing Attacks	Inferring model parameters or training data by measuring computation times during inference.

About This Project

Contribute if you come across any new vulnerabilities that are not on this list.

License

This project is licensed under the GPL-3.0 License - see the LICENSE file for details.