Could AI be in Trouble with Rowhammer?

13707471882?profile=RESIZE_400xAutonomous vehicles and many other automated systems are controlled by AI, but the AI itself could be compromised by malicious attackers who take control of the AI’s weights.  Weights within AI’s deep neural networks represent the models’ learning and how it is used.  A weight is usually defined in a 32-bit word, and there can be hundreds of billions of bits involved in this AI's reasoning process.  It is a no-brainer that if an attacker controls the weights, they control the AI.[1]

A research team from George Mason University, led by associate professor Qiang Zeng, presented a paper (PDF) at this year’s August USENIX Security Symposium describing a process that can flip a single bit to alter a targeted weight.  The effect could change a benign and beneficial outcome into a potentially dangerous and disastrous outcome.

Example effects could alter an AV’s interpretation of its environment (for example, recognizing a stop sign as a minimum speed sign), or a facial recognition system (for example, interpreting anyone wearing a specified type of glasses as the company CEO).  And let’s not even imagine the harm that could be done through altering the outcome of a medical imaging system.

All this is possible. It is difficult, but achievable.  Flipping a specific bit would be relatively easy with Rowhammer (by selecting which rows to hammer, an attacker can flip specific bits in memory).[2]  Finding a suitable bit to flip among the multiple billions in use is complex but can be done offline if the attacker has white-box access to the model.  The researchers have largely automated the process of locating suitable single bits that could be flipped to dramatically change the individual weight value.  Since this is just one weight among hundreds of millions, it will not significantly affect the model's performance.  The AI compromise will have built-in stealth, and the cause of any resultant ‘accident’ would probably never be discovered.

The attacker then crafts, again offline, a trigger targeting this one weight.  “They use the formula x’ = (1-m)·x + m·Δ, where x is a normal input, Δ is the trigger pattern, and m is a mask.  The optimization balances two goals: making the trigger activate neuron N1 with high output values, while keeping the trigger visually imperceptible,” write the researchers in a separate blog.

Finally, the Rowhammer action and trigger are inserted (by any suitable exploit means) into the online AI model.  There it sits, imperceptible and dormant, until the model is triggered by the targeted sensor input.

The attack has been named OneFlip. “OneFlip,” writes Zeng in the Usenix paper, “assumes white-box access, meaning the attacker must obtain the target model, while many companies keep their models confidential.  Second, the attacker-controlled process must reside on the same physical machine as the target model, which may be difficult to achieve. Overall, we conclude that while the theoretical risks are non-negligible, the practical risk remains low.”

The combined effect of these difficulties suggests a low threat level from financially motivated cybercriminals, who prefer to attack low-hanging fruit with a high ROI.  But it is not a threat that should be ignored by AI developers and users.  It could already be employed by elite nation-state actors where the ROI is measured by political effect rather than financial return.

Zeng stated, “The practical risk is high if the attacker has moderate resources/knowledge.  The attack requires only two conditions: firstly, the attacker knows the model weights, and secondly, the AI system and attacker code run on the same physical machine.  Since large companies such as Meta and Google often train models and then open-source or sell them, the first condition is easily satisfied.  For the second condition, attackers may exploit shared infrastructure in cloud environments where multiple tenants run on the same hardware.  Similarly, on desktops or smartphones, a browser can execute both the attacker’s code and the AI system.”

Security must always look for potential future attacks, rather than just the current threat state. Consider deepfakes.  Only a few years ago, they were a known and occasionally used attack, but not widely and not always successfully.  Today, aided by AI, they have become a major, dangerous, common, and successful attack vector.

Zeng added, “When the two conditions we mention are met, our released code can already automate much of the attack; for example, identifying which bit to flip. Further research could make such attacks even more practical.  One open challenge, which is on our research agenda, is how an attacker might still mount an effective backdoor attack without knowing the model’s weights.”  The warning in Zeng’s research is that both AI developers and AI users should be aware of the potential of OneFlip and prepare possible mitigations today.

 

This article is shared with permission at no charge for educational and informational purposes only.

Red Sky Alliance is a Cyber Threat Analysis and Intelligence Service organization.  We provide indicators of compromise information via a notification service (RedXray) or an analysis service (CTAC).  For questions, comments, or assistance, please contact the office directly at 1-844-492-7225 or feedback@redskyalliance.com    

Weekly Cyber Intelligence Briefings:
REDSHORTS - Weekly Cyber Intelligence Briefings
https://register.gotowebinar.com/register/5207428251321676122

 

[1] https://www.securityweek.com/oneflip-an-emerging-threat-to-ai-that-could-make-vehicles-crash-and-facial-recognition-fail/

[2] Rowhammer (also written as row hammer or RowHammer) is a computer security exploit that takes advantage of an unintended and undesirable side effect in dynamic random-access memory (DRAM) in which memory cells interact electrically between themselves by leaking their charges, possibly changing the contents of nearby memory rows that were not addressed in the original memory access. This circumvention of the isolation between DRAM memory cells results from the high cell density in modern DRAM, and can be triggered by specially crafted memory access patterns that rapidly activate the same memory rows numerous times.

E-mail me when people leave their comments –

You need to be a member of Red Sky Alliance to add comments!