The underground market for large illicit language models is lucrative, said academic researchers who called for better safeguards against artificial intelligence misuse. Academics at the Indiana University Bloomington[1] identified 212 malicious LLMs on underground marketplaces from April through September 2024. The financial benefit for the threat actor behind one of them, WormGPT, is calculated at US$28,000 over two months, underscoring the allure for harmful agents to break artificial intelligence guardrails and the raw demand propelling them to do so.[2]
Several illicit LLMs on sale were uncensored and built on open-source standards; some were jailbroken commercial models. Academics behind the paper call the malicious LLMs "Mallas." Hackers can maliciously use Mallas to write targeted phishing emails at scale at a fraction of the cost, develop malware, and automatically scope and exploit zero-days.
Technology companies developing artificial intelligence models have mechanisms in place to prevent jailbreaking and are working on methods to automate the detection of jailbreaking prompts. But hackers have also discovered methods to bypass the guardrails.
Microsoft recently detailed hackers using a "skeleton key" to force OpenAI, Meta, Google, and Anthropic's LLMs to respond to illicit requests and reveal harmful information. Robust Intelligence and Yale University researchers also identified an automated method for jailbreaking OpenAI, Meta, and Google LLMs that doesn't require specialized knowledge, such as the model parameters.
University of Indiana researchers found two uncensored LLMs: DarkGPT, sold for US$ 0.78 for every 50 messages, and Escape GPT, a subscription service that costs US$64.98 monthly. Both models produced accurate, malicious code that went undetected by antivirus tools about two-thirds of the time. WolfGPT, available for a US$150 flat fee, allowed users to write phishing emails that could evade most spam detectors.
The researchers examined nearly all the malicious LLMs that could generate malware, and 41.5% could produce phishing emails. The malicious products and services were primarily built on OpenAI's GPT-3.5 and GPT-4, Pygmalion-13B, Claude Instant, and Claude-2-100k. OpenAI is the LLM vendor that the malicious GPT builders targeted most frequently.
To help prevent and defend against attacks, the researchers discovered and made available to other researchers the dataset of prompts used to create malware through the uncensored LLMs and bypass the safety features of public LLM APIs. They also urged AI companies to default to releasing models with censorship settings and allow access to uncensored models only to the scientific community, with safety protocols in place. Hosting platforms such as FlowGPT and Poe should do more to ensure that Mallas are not available through them, they said, adding, "This laissez-faire approach essentially provides a fertile ground for miscreants to misuse the LLMs."
This article is shared at no charge and is for educational and informational purposes only.
Red Sky Alliance is a Cyber Threat Analysis and Intelligence Service organization. We provide indicators of compromise information via a notification service (RedXray) or an analysis service (CTAC). For questions, comments, or assistance, please get in touch with the office directly at 1-844-492-7225 or feedback@redskyalliance.com
- Reporting: https://www.redskyalliance.org/
- Website: https://www.redskyalliance.com/
- LinkedIn: https://www.linkedin.com/company/64265941
Weekly Cyber Intelligence Briefings:
REDSHORTS - Weekly Cyber Intelligence Briefings
https://register.gotowebinar.com/register/5378972949933166424
[1] https://bloomington.iu.edu
[2] https://www.bankinfosecurity.com/underground-demand-for-malicious-llms-robust-a-26223
Comments