Adversary Models for AI

When discussing security it’s useful to have a model of your adversary.
Models help us understand the impact of the selected adversary and develop optimal attack and defense strategies.

Adversary model

The adversary model I’m using consists of several attributes:

  • Goal of the attack
  • Knowledge of the attacked system
  • Capability for manipulating data
  • Attack strategy


  • Causative (poisoning) attacks

    Under this scenario it’s assumed that the adversary controls a percentage of the training and test data via specially-crafted attack samples.
  • Exploratory (evasion) attacks

    In evasion attacks, the adversary manipulates the data at test time with the goal of avoiding detection.


  • Availability

    Availability is violated when the system is compromised, causing disruption of service. Think of this as DDoS for AI.
  • Integrity

    Integrity is compromised when the adversary is able to do malicious activities without being detected and without compromising the functionality of the system.
  • Privacy

    The adversary may want to trick the model into disclosing private details about it’s users.


  • Perfect Knowledge

    The worst case scenario in which the adversary knows the data, feature space, model and the algorithm.
  • Limited Knowledge

    A more realistic scenario in which the adversary knows partial information.
    1. Knowledge of the training data
      The adversary may have access to the training data, a subset of the training data, or access to surrogate dataset which is collected from the same source as the original dataset.
    2. Knowledge of the feature representation
      The adversary may know which subset of features is used.
    3. Knowledge of the feature selection algorithm
      It’s possible for an adversary to know which feature selection algorithm is used.
    4. Knowledge of the learning algorithm
      The adversary may know the learning algorithm and it’s parameters.
    5. Black box approach
      In this scenario the adversary can submit samples to the model and observe the response.

Connecting the model to the real world

#1 - Black box approach

An example of the black box approach would be creating and submitting samples to virusTotal until all antivirus engines are avoided.

#2 - Adversary with limited knowledge

Target: Next-gen security solutions

#2.1 Gaining knowledge about the model

Looking at the research papers released by the company and blogs(or any online profile) from the members of the data science team may give you some clues about the learning algorithm and the feature space.
Data gathered from honeypots can be used as surrogate dataset and if the target company uses threat intelligence services we can subscribe to the same feeds.
Combining these sources will give us sneak peak into their training dataset.

#2.2 Attack strategy

Having access to the data locally allows you to avoid api calls and reduces the time needed to craft an adversarial example.
Knowing the type and location of honeypots, one may try to use non existing attacks in attempt to insert malicious samples into the training dataset. Flooding the honeypot with lots of data with the goal of compromising availability is also a valid attack, although it’s safe to assume that this will be detected at the data preparation step.

#3 - Adversary with perfect knowledge

The attacks involving adversary with perfect knowledge of the system are left as exercise for the reader.