Data Science and AI

Untrainable: Nightshade and the Fight Against AI Image Generators

Images produced by a ‘poisoned’ AI image generator (source: https://arxiv.org/pdf/2310.13828)

Claim your CPD points

In recent years, AI image tools have made it possible for anyone to create complex, detailed artwork in seconds— no brushes, training, or background required. Yet the power behind these generators rests on a vast foundation of human-made work: millions of images scraped from the internet, many created by professional artists who never knew — or agreed — that their portfolios would feed a machine. A growing number of those artists are now fighting back.

This is the first article in Untrainable , a series originally published on LinkedIn by the Young Data Science Working Group exploring how creators are fighting back against generative AI systems that have "learned" from their work without consent. Follow the Data Science Actuaries on LinkedIn to stay updated on the latest articles.

The art that trains the machine

Modern text-to-image models such as Stable Diffusion and Midjourney rely on diffusion architectures: they learn to reverse-engineer noise into images by seeing billions and billions of real pictures paired with text captions.

Stable Diffusion built its models on the text-image pairs from the non-profit group LAION (“Large-scale Artificial Intelligence Open Network”). The data is compiled via large-scale web-scrapes which embed both words and pixels into high-dimensional vectors called latent representations.

The latest Stable Diffusion models learned to mimic copyrighted materials and artists’ styles from a training set of over 5 billion images scraped from the internet 

The latest Stable Diffusion models learned to mimic copyrighted materials and artists’ styles from a training set of over 5 billion images scraped from the internet 


Because the scrape is largely automated, copyrighted and personal art is absorbed alongside public-domain material—often without consent, notice or the ability to opt-out. The result: anyone can type “swirling Van Gogh-style seascape at twilight” and receive a canvas-like scene that imitates an artist’s labour in seconds.

Artists take back control

With lawsuits and new regulations moving slowly, some creators have taken justice into their own hands adopting data-poisoning defences. The most prominent is Nightshade. Developed at the University of Chicago, the technique uses gradient-based optimisation to introduce imperceptible pixel shifts into an artwork. These changes are designed so that, during training, the image causes the model to update in a way that misaligns the target prompt — pushing it to generate incorrect or corrupted results.

The process is simple:

  1. Choose your image: Start with your original artwork (the one you want to protect).
  2. Pick a destination concept: Decide the direction you want image generation to drift instead (e.g. when asked for your image, the model will start generating cats when prompted)
  3. Optimise the poison: Nightshade subtly perturbs your image so that during training, it hijacks the gradients associated with that destination concept, gradually steering the model away from correct associations.
  4. Minimal visual change: The result might have a slight gloss or colour tint, but to the human eye, it still looks like your original. To the model, however, it’s a training-time landmine.
Mona Lisa before (left) and after (right) being poisoned with Nightshade to make image generators train towards ‘cats’

Mona Lisa before (left) and after (right) being poisoned with Nightshade to make image generators train towards ‘cats’

Voilà! An image that’s almost unchanged, yet potentially toxic to any model that tries to learn from it. Users can control the poisoning strength, so if enough samples are scraped, they can make the generator produce warped or absurd results in response to the poisoned prompt.

Does it actually work?

Early experiments are striking. The original 2023 Nightshade paper showed that ≈50 poisoned images (only 0.003% of a 1.5 M-image subset) could noticeably distort Stable Diffusion’s ability to draw a “dog.” Around 300 poisoned samples forced the model to produce images that looked more like cats (the target prompt). Larger, more general poisons degraded overall image quality and leaked into related prompts.

Results of asking for a picture of a “dog” from Stable Diffusion XL after fine-tuning on 100k images with varying amounts of poison samples targeting “cat”

Results of asking for a picture of a “dog” from Stable Diffusion XL after fine-tuning on 100k images with varying amounts of poison samples targeting “cat”

The authors caution that global model collapse would still require thousands to millions of poisoned samples, but even small-scale, prompt-specific attacks can seriously undermine specific capabilities. This includes ‘style mimicry’, which is exactly what concerns many illustrators and concept artists.

This isn’t poisoning the ocean, it’s poisoning a well. Because diffusion models learn concepts from a relatively small number of examples, it doesn’t take massive volume to damage specific prompts. But this is still untested in the real world and to “kill” an entire model trained on decades of scraped internet art, the required volume of poisons is enormous, and it may take years of retraining cycles before the effects are widespread — by which point, the damage to creative ecosystems may already be irreversible.

Spotting the synthetic: AI-image detection

Parallel to poisoning, detection services try to defend from another angle by flagging AI imagery in the wild. Tools such as Illuminarty, Hive Moderation, and IsGen inspect frequency artefacts, compression signatures, and latent-space irregularities, returning a probability score that a picture is machine-made. Benchmarks show mixed success depending on the generator and resolution, and even minor edits to the AI image can drop the likelihood of detection significantly.  

Even simple edits of AI images can fool many of the AI detection tools

Even simple edits of AI images can fool many of the AI detection tools

Generative models are already capable of fabricating photorealistic scenes that look authentic enough for spreading disinformation or committing insurance fraud. Just as with artistic copyright, maintaining public trust in imagery will require coordinated multi-layer responses.

A broader battle

Nightshade stands out as a creative and technically impressive attempt to reclaim individual agency in a space dominated by large-scale AI systems. But for it to be truly effective, it likely needs to be part of a broader proactive and reactive legal, technical, and social strategy. 

For artists to successfully protect their style and copyrighted material, reactive and proactive measures must work in tandem; no single layer is sufficient.

For artists to successfully protect their style and copyrighted material, reactive and proactive measures must work in tandem; no single layer is sufficient

Nightshade is just one part of a broader movement against AI generated images, but it takes an approach that turns the tables. Instead of being passively mined, artists can embed “landmines” that destabilise models built on non-consensual data. Its potency (requiring just dozens, not millions, of images per concept) proves that even billion-scale models are not immune to well-crafted sabotage. A sustainable future for generative art will hinge on clearer consent frameworks, robust authentication, and ongoing technical defences—ensuring that innovation grows with creators, not over them.

In the next instalment of Untrainable, we’ll dive into an experimental technique called Harmony Cloak, and see whether sonic poisoning can keep pace with rapidly evolving generative music models.

As #DataScienceActuaries, we’re always looking for another data set to wrangle into something fun using our unique blend of data and actuarial skills. If you have any interesting ideas and want to get involved, join the Data Science Actuaries page or reach out to any of our members.

Analysing the tools of resistance against AI-generated content (ironically, image was AI-generated via ChatGPT)

Analysing the tools of resistance against AI-generated content (ironically, image was AI-generated via ChatGPT)

Further reading

Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models https://arxiv.org/pdf/2310.13828

Organic or Diffused: Can We Distinguish Human Art from AI-generated Images? https://people.cs.uchicago.edu/~ravenben/publications/pdf/organic-ccs24.pdf

About the authors
Ean Chan headshot
Ean Chan
Ean is a Senior Manager within EY's Actuarial Services team, with experience in Life Insurance, Data Analytics and AI, primarily concentrating on Health and Human Services clients. As chair of the Institute's Young Data Analytics Working Group and member of the Data Science and AI Practice Committee, Ean is dedicated to driving progress in the actuarial field by augmenting our expertise with the latest data science, AI and machine learning methodologies.
Justin McGee Odger
Justin is a seasoned leader specialising in data, AI, and alternative investments. He has been recognised with top accolades from The Big Issue, Deloitte, LinkedIn, CSIRO, and Google for his international contributions to innovation. His expertise lies in bridging the gap between traditional actuarial methods and cutting-edge, data-driven solutions, with a passion for communicating these concepts simply in order to drive economic value.
Scott Teoh