Understanding Adversarial Machine Learning: Threats and Challenges
9 minutes
May 13, 2025

Imagine unlocking your phone with facial recognition, but a tiny sticker tricks it into recognizing someone else. Or a self-driving car misreads a slightly altered stop sign as a speed limit sign, leading to a risky mistake. These are examples of adversarial attacks—subtle manipulations designed to deceive AI models.
As AI becomes embedded in healthcare, finance, cybersecurity, and autonomous systems. Adversarial Machine Learning (AML) refers to methods that take advantage of weaknesses in AI to produce erroneous decisions, security flaws, and erosion of trust.
Attackers may manipulate AI at different levels, ranging from data gathering to deployment, by changing training data, designing misleading inputs, or revealing sensitive information. With the pace of AI adoption, it is important to be aware of adversarial threats and implement security features.
The National Institute of Standards and Technology (NIST) published the AML Taxonomy Report, presenting a formal way to address understanding and defense against adversarial machine learning (AML) threats. This blog highlights some important report insights into key attack types, vulnerabilities, and defense approaches
What is Adversarial Machine Learning?
Adversarial Machine Learning (AML) is a method of misleading AI systems to perform incorrect judgments. It implies changing the data AI is based on by a small margin, typically in ways that humans are not even aware of. The minuscule changes have the potential to mislead AI models and result in errors that may lead to security threats or render AI unreliable.
For instance, a small sticker on a stop sign may cause an autonomous vehicle to believe it is a speed limit sign, which may result in risky situations. Hackers can also deceive facial recognition systems by manipulating images, enabling unauthorized entry. In finance, scammers may manipulate financial information to evade fraud detection systems.
As AI becomes increasingly prevalent in fields such as healthcare, security, and finance, securing it against these attacks is a top priority. If AI systems can readily be tricked, they cannot be trusted to make critical decisions. That's why companies and researchers are developing methods of securing AI and making it less vulnerable to these tricks.
Predictive AI (PredAI) vs. Generative AI (GenAI)
AI systems can be broadly categorized into two classes: Predictive AI (PredAI) and Generative AI (GenAI). Each has unique vulnerabilities and requires different security strategies.
Understanding Predictive AI (PredAI)
Predictive AI is about analyzing past data to predict or categorize inputs. Such systems are typically applied to fields like fraud detection, medical diagnosis, recommendation systems, and automated decision-making systems. The primary role of predictive AI is to accept an input and return an output based on patterns learned. For example, a predictive AI system in finance can examine transactional data to detect potential fraud.
Understanding Generative AI (GenAI)
Generative AI, as opposed to predictive AI, aims to generate novel data or content from patterns it has learned. Some examples include language models, and music generation AI. In contrast to PredAI, which anticipates a label or result, GenAI creates entirely novel outputs like text, images, or videos.
The models learn from enormous datasets and can be customized to produce content that replicates human-like reasoning and creativity.
Key Differences Between Predictive AI and Generative AI

Taxonomy of Attacks in Predictive AI
In Predictive AI, adversarial machine learning (AML) attacks mainlytarget manipulating the behavior of the AI model at different phases of its lifecycle. These attacks are classified along multiple axes to comprehend better how they attack and exploit vulnerabilities in these systems. The primary categories are:
1. Stages of Learning
The predictive AI lifecycle spans several key stages: data collection, model training, deployment, and inference (or prediction). Each of these stages presents unique opportunities for adversaries to launch attacks.
- Data Collection & Training Stage: At this stage, attackers may engage in poisoning attacks to inject malicious data into the training set. This causes the model to learn incorrect patterns or biases, degrading its predictive power. For instance, backdoor poisoning is a specific type of attack where attackers subtly manipulate training data so that the model will output a specific, incorrect prediction when certain triggers are present in future inputs.
- Deployment Stage: Once the model is deployed, attackers may try to deceive the model in real-world situations. This can be achieved through evasion attacks, where the attacker subtly modifies input data so that the model misclassifies it. A well-known example is where a small, well-crafted noise is added to an image to trick the computer vision model into misclassifying it, with the noise being imperceptible to humans.
2. Attacker Goals and Objectives
Adversaries have different objectives when launching attacks on predictive AI systems. These objectives can be grouped into three broad categories:
- Availability Breakdown: The intention here is to interfere with the AI system's operations and prevent legitimate users from receiving service. In the PredAI context, this can be done through data poisoning attacks or even manipulating model vulnerabilities that impair the model's performance, for example, model poisoning or energy-latency attacks that disrupt the system's functionality in handling requests.
- Integrity Violation: In this scenario, the attacker aims to manipulate the output of the AI system without necessarily disrupting its availability. Targeted poisoning attacks modify the model’s training data to specifically mislead the system into making incorrect predictions for certain inputs. Backdoor poisoning, as mentioned before, allows the attacker to introduce hidden triggers into the model that cause it to misbehave only when certain conditions are met.
- Privacy Compromise: Privacy attacks target the extraction of sensitive data from the AI system. Attackers might try to infer properties of the training data or the model itself. Samples are membership inference attacks, where an attacker determines if a specific data point belongs to the training set, or model extraction attacks, where they try to reverse-engineer the model in order to figure out its architecture and parameters.
3. Attacker Capabilities & Knowledge
The type of attack also depends on how much the attacker knows about the model, which is categorized as follows:
- Query Access: Some attacks are carried out simply by interacting with a deployed model, which is typically the case in black-box attacks. The adversary doesn't have any knowledge of the model's internal workings and can only observe the outputs of the model for given inputs.
- White-Box vs. Black-Box Attacks: White-box attacks occur when the attacker has full knowledge of the model’s architecture, parameters, and sometimes even the training data. This allows for highly targeted attacks, like adversarial example generation where the attacker can fine-tune input perturbations based on complete access to the model. Black-box attacks, by contrast, are more constrained. The attacker only has access to the model’s input-output behavior, but no internal knowledge of how the model operates. Despite this, black-box attackers can still execute effective attacks, particularly through transferability—adversarial examples crafted for one model may work on another model with a similar architecture.
- Gray-Box Attacks: These attacks are between white-box and black-box, where the attacker has some knowledge about the system, e.g., knowing the architecture of the model but not its precise parameters or training data. This partial knowledge enables them to perform more advanced attacks than in black-box scenarios, but without the complete control of white-box attacks.
4. Specific Attack Classes
This category further classifies different types of attacks specific to Predictive AI:
- Evasion Attacks: These attacks happen at inference, where an attacker subtly modifies the input data to mislead the AI model into making wrong predictions. Adversarial examples are the most prevalent evasion attack type. These attacks tend to be imperceptible to humans but highly interfere with the model's predictions, for example, by modifying pixel values in an image or changing the structure of a text input.
- Poisoning Attacks: As mentioned, these attacks occur during the training phase, where the attacker manipulates the training data to degrade model performance. Poisoning attacks can take several forms, including:
- Availability poisoning: Reducing the overall accuracy or utility of the model.
- Targeted poisoning: Altering the model's output for specific, targeted inputs, often leading to incorrect predictions in critical scenarios.
- Model poisoning: Directly manipulating the model’s parameters after the training phase, which may cause the model to behave in a malicious manner.
- Privacy Attacks: Privacy violations target confidential information being obtained from the model or training data. They comprise data reconstruction attacks, wherein adversaries attempt to rebuild the training data based on the outputs of the model, and membership inference attacks, wherein a hacker is able to figure out whether a specific data point belongs to the training set.
Taxonomy of Attacks in Generative AI
In Generative AI (GenAI), the attack surface and objectives differ significantly due to the nature of these systems. While the attack categories remain largely similar, the specific vulnerabilities and techniques differ because of the creative, content-generating capabilities of GenAI models.
1. GenAI Stages of Learning
Similar to PredAI, GenAI models undergo several stages, each of which is susceptible to different types of attacks:
- Pre-training: During this phase, attackers can contaminate the massive, heterogeneous datasets utilized to train GenAI models. Contamination during this period may include adding dangerous or misleading information that biases the model's comprehension, leading to the model producing biased or unsuitable outputs.
- Fine-tuning: The attackers can take advantage of the fine-tuning step, where the model is optimized with smaller and more specific datasets. Tampering with this dataset has the potential to create targeted vulnerabilities, e.g., having the model create content that interests the attacker.
- Deployment: During deployment, GenAI systems face new attack vectors, such as prompt injection, where an attacker crafts inputs to manipulate the model into generating harmful or malicious content.
2. Attacker Goals and Objectives
GenAI shares many of the same objectives as PredAI but also introduces unique risks associated with content generation:
- Availability Breakdown & Integrity Violation: Like PredAI, these attacks have the objective to interfere with the system or the outputs of the system. Nonetheless, for GenAI, the effect could be more worrisome as it might cause the creation of negative, deceptive, or biased text.
- Privacy Compromise: Privacy attacks in GenAI aim to extract sensitive information from the training data. For example, attackers may try to uncover details about the individuals represented in the training datasets.
- Misuse: One unique threat in GenAI systems is misuse, where attackers exploit the model’s generative capabilities to produce harmful or unsafe content, such as generating fake news, offensive material, or bypassing safety mechanisms.
3. Attack Techniques
In addition to traditional attacks, GenAI faces new challenges:
- Supply Chain Attacks: These attacks happen against the entire pipeline employed in producing GenAI models, starting from the data to the libraries and dependencies in the training. The attackers could potentially manipulate any step of the pipeline to include backdoors within the model.
- Direct Prompting Attacks: This includes attackers designing particular prompts to trick the model into generating unwanted results, like prompt injection, which tries to deceive the system into skipping safety filters, or jailbreaking, which bypasses safety measures.
- Indirect Prompt Injection: Attackers may inject harmful prompts into external data sources that the GenAI model references. These data sources could include anything from social media feeds to research papers that the AI might incorporate in its generation process, making it vulnerable to the introduction of false or malicious information.
Mitigation Strategies
Several strategies can help protect GenAI systems from malicious attacks:
- Instruction formatting: Ensuring that the inputs to the system are structured to prevent malicious prompt injections.
- Input modification: Filtering inputs for harmful content or malicious patterns before they reach the model.
- Monitoring & access control: Implementing systems that track and monitor how the model is used, identifying and preventing harmful interactions.
Challenges and Future Directions in Adversarial Machine Learning (AML)
As AI systems become smarter, hackers and attackers also develop new ways to trick them. Adversarial Machine Learning (AML) is the study of how AI models can be attacked and how to make them stronger against such threats. To keep AI secure and reliable, researchers must find new ways to defend against these attacks without harming the model’s performance or fairness.

Here are some of the biggest challenges in AML and what needs to be done:
1. Balancing Accuracy, Robustness, and Fairness
AI models need to be accurate, but focusing only on accuracy can make them easier to fool. Robustness means making the AI resistant to attacks, but this can lower accuracy. Fairness ensures that AI does not favor one group over another, but making an AI fair might also reduce accuracy or robustness.
Example: Imagine a facial recognition system used for security. If it focuses only on accuracy, it might fail when someone tries to trick it with a printed photo (lack of robustness). If it prioritizes fairness, ensuring all skin tones are recognized equally, it might slightly lower accuracy for some cases. Researchers must find the right balance between all three.
2. Securing Large AI Models
As AI models get bigger and more complex, keeping them secure becomes harder. Larger models need more computing power, and defensive strategies that work on small models may not work on larger ones.
Example: ChatGPT and similar AI models are massive. If an attacker finds a weakness in a small AI model, fixing it is easy. But with large models that take months to train, fixing security issues without slowing them down is a major challenge.
3. Measuring the Strength of AML Defenses
There is no universal way to test how well an AI model can resist attacks. New attack methods emerge constantly, and some security defenses only work against specific types of attacks.
Example: A bank’s fraud detection system might be trained to detect fake transactions based on past fraud cases. However, if fraudsters use a new technique the AI has never seen before, the system might fail. AI researchers need better ways to test defenses against future, unknown threats.
4. Protecting the AI Supply Chain
AI models rely on data, software, and hardware from different sources. If any part of this supply chain is attacked, the entire AI system can be compromised.
Example: If an attacker secretly adds misleading data to a self-driving car’s training set, the car might misinterpret stop signs, causing accidents. Similarly, a cybersecurity AI tool using third-party software with hidden vulnerabilities could be hacked. Keeping every step of AI development secure is essential.
5. The Limits of AI Defenses
Even with advanced security techniques, AI models remain vulnerable to attacks. There may be fundamental reasons why AI can never be made 100% secure without limiting its capabilities.
Example: A spam filter must correctly identify spam emails while allowing legitimate emails through. Attackers can create spam that looks almost identical to real emails, tricking the filter. If the filter is made too strict, it might also block important emails. Finding the perfect balance remains an ongoing challenge.
Conclusion
Adversarial Machine Learning (AML) poses a significant challenge to the security and reliability of AI systems. By defining key attack vectors and vulnerabilities, researchers and practitioners can better understand, categorize, and mitigate these threats. Establishing a common framework for AML concepts—such as adversarial examples, backdoor attacks, model extraction, and prompt injection—helps in developing more resilient AI defenses.
As adversarial threats continue to evolve, ongoing research, innovation, and collaboration across industries and academia will be essential to strengthening AI security. Robust mitigation strategies, standardized risk assessments, and adaptive defense mechanisms will play a crucial role in ensuring that AI remains trustworthy and resistant to manipulation.
SHARE THIS
Discover More Articles
Explore a curated collection of in-depth articles covering the latest advancements, insights, and trends in AI, MLOps, governance, and more. Stay informed with expert analyses, thought leadership, and actionable knowledge to drive innovation in your field.

Is Explainability critical for your AI solutions?
Schedule a demo with our team to understand how AryaXAI can make your mission-critical 'AI' acceptable and aligned with all your stakeholders.