To make the Machine Learning (ML) model learn the wrong thing, adversaries can target the model’s training data, foundational models, or both. Adversaries exploit this class of vulnerabilities to influence models using data and parameter manipulation methods, which practitioners term poisoning. Poisoning attacks cause a model to incorrectly learn something that the adversary can exploit at a future time. For example, an attacker might use data poisoning techniques to corrupt a supply chain for a model designed to classify traffic signs. The attacker could exploit threats to the data by inserting triggers into training data that can influence future model behavior so that the model misclassifies a stop sign as a speed limit sign when the trigger is present. A supply chain attack is effective when a foundational model is poisoned and posted for others to download. Models that are poisoned from supply chain type of attacks can still be susceptible to the embedded triggers resulting from poisoning the data.[1]
Attackers can also manipulate ML systems into doing the wrong thing. This class of vulnerabilities causes a model to perform unexpectedly. For instance, attacks can be designed to cause a classification model to misclassify by using an adversarial pattern that implements an evasion attack. Ian Goodfellow, Jonathon Shlens, and Christian Szegedy produced one of the seminal research works in this area. They added an imperceptible-to-humans adversarial noise pattern to an image, forcing an ML model to misclassify it. The researchers took an image of a panda that the ML model classified properly, then generated and applied a specific noise pattern to the image. The resulting image appeared to be the same Panda to a human observer. However, when the ML model classified this image, it produced a prediction result of gibbon, thus causing the model to do the wrong thing.
Adversaries can cause ML to reveal the wrong thing. In this class of vulnerabilities, an adversary uses an ML model to reveal some aspect of the model or the training dataset that the model’s creator did not intend to reveal. Adversaries can execute these attacks in several ways. In a model extraction attack, an adversary can duplicate a model the creator wants to keep private. To execute this attack, the adversary only needs to query a model and observe the outputs. This class of attack concerns ML-enabled Application Interface (API) providers since it can enable a customer to steal the model that enables the API.
Defending a machine learning system against an adversary is a hard problem and an area of active research with few proven generalizable solutions. While generalized and proven defenses are rare, the adversarial ML research community is hard at work producing specific defenses that can be applied to protect against specific attacks. Developing test and evaluation guidelines will help practitioners identify system flaws and evaluate prospective defenses. This area of research has developed into a race in the adversarial ML research community in which defenses are proposed by one group and then disproved by others using existing or newly developed methods. The number of factors influencing the effectiveness of any defensive strategy precludes articulating a simple menu of defensive strategies geared to the various methods of attack. Rather, we have focused on robustness testing.
ML models that successfully defend against attacks are often assumed to be robust, but the robustness of ML models must be proved through test and evaluation. The ML community has started to outline the conditions and methods for performing robustness evaluations on ML models. The first consideration is to define the conditions under which the defense or adversarial evaluation will operate. These conditions should have a stated goal, a realistic set of capabilities your adversary has at its disposal, and an outline of how much knowledge the adversary has of the system.
Developers should ensure that your evaluations are adaptive. Every evaluation should build upon prior evaluations, be independent, and represent a motivated adversary. This approach allows a holistic evaluation that considers all information and is not overly focused on one error instance or set of evaluation conditions.
Scientific standards of reproducibility should apply to your evaluation. For example, you should be skeptical of any results obtained and vigilant in proving the results are correct and true. The results should be repeatable, reproducible, and not dependent on specific conditions or environmental variables that prohibit independent reproduction.
This article is presented at no charge for educational and informational purposes only.
Red Sky Alliance is a Cyber Threat Analysis and Intelligence Service organization. For questions, comments, or assistance, please get in touch with the office directly at 1-844-492-7225, or feedback@redskyalliance.com
Weekly Cyber Intelligence Briefings:
- Reporting: https://www.redskyalliance.org/
- Website: https://www.redskyalliance.com/
- LinkedIn: https://www.linkedin.com/company/64265941
[1] https://www.oodaloop.com/archive/2023/06/09/the-challenges-of-and-defending-against-adversarial-machine-learning/
Comments