Claim your CPD points
In recent years, AI image tools have made it possible for anyone to create complex, detailed artwork in seconds— no brushes, training, or background required. Yet the power behind these generators rests on a vast foundation of human-made work: millions of images scraped from the internet, many created by professional artists who never knew — or agreed — that their portfolios would feed a machine. A growing number of those artists are now fighting back.
This is the first article in Untrainable , a series originally published on LinkedIn by the Young Data Science Working Group exploring how creators are fighting back against generative AI systems that have "learned" from their work without consent. Follow the Data Science Actuaries on LinkedIn to stay updated on the latest articles.
Modern text-to-image models such as Stable Diffusion and Midjourney rely on diffusion architectures: they learn to reverse-engineer noise into images by seeing billions and billions of real pictures paired with text captions.
Stable Diffusion built its models on the text-image pairs from the non-profit group LAION (“Large-scale Artificial Intelligence Open Network”). The data is compiled via large-scale web-scrapes which embed both words and pixels into high-dimensional vectors called latent representations.
The latest Stable Diffusion models learned to mimic copyrighted materials and artists’ styles from a training set of over 5 billion images scraped from the internet
Because the scrape is largely automated, copyrighted and personal art is absorbed alongside public-domain material—often without consent, notice or the ability to opt-out. The result: anyone can type “swirling Van Gogh-style seascape at twilight” and receive a canvas-like scene that imitates an artist’s labour in seconds.
With lawsuits and new regulations moving slowly, some creators have taken justice into their own hands adopting data-poisoning defences. The most prominent is Nightshade. Developed at the University of Chicago, the technique uses gradient-based optimisation to introduce imperceptible pixel shifts into an artwork. These changes are designed so that, during training, the image causes the model to update in a way that misaligns the target prompt — pushing it to generate incorrect or corrupted results.
The process is simple:
Mona Lisa before (left) and after (right) being poisoned with Nightshade to make image generators train towards ‘cats’
Voilà! An image that’s almost unchanged, yet potentially toxic to any model that tries to learn from it. Users can control the poisoning strength, so if enough samples are scraped, they can make the generator produce warped or absurd results in response to the poisoned prompt.
Early experiments are striking. The original 2023 Nightshade paper showed that ≈50 poisoned images (only 0.003% of a 1.5 M-image subset) could noticeably distort Stable Diffusion’s ability to draw a “dog.” Around 300 poisoned samples forced the model to produce images that looked more like cats (the target prompt). Larger, more general poisons degraded overall image quality and leaked into related prompts.
Results of asking for a picture of a “dog” from Stable Diffusion XL after fine-tuning on 100k images with varying amounts of poison samples targeting “cat”
The authors caution that global model collapse would still require thousands to millions of poisoned samples, but even small-scale, prompt-specific attacks can seriously undermine specific capabilities. This includes ‘style mimicry’, which is exactly what concerns many illustrators and concept artists.
This isn’t poisoning the ocean, it’s poisoning a well. Because diffusion models learn concepts from a relatively small number of examples, it doesn’t take massive volume to damage specific prompts. But this is still untested in the real world and to “kill” an entire model trained on decades of scraped internet art, the required volume of poisons is enormous, and it may take years of retraining cycles before the effects are widespread — by which point, the damage to creative ecosystems may already be irreversible.
Parallel to poisoning, detection services try to defend from another angle by flagging AI imagery in the wild. Tools such as Illuminarty, Hive Moderation, and IsGen inspect frequency artefacts, compression signatures, and latent-space irregularities, returning a probability score that a picture is machine-made. Benchmarks show mixed success depending on the generator and resolution, and even minor edits to the AI image can drop the likelihood of detection significantly.
Even simple edits of AI images can fool many of the AI detection tools
Generative models are already capable of fabricating photorealistic scenes that look authentic enough for spreading disinformation or committing insurance fraud. Just as with artistic copyright, maintaining public trust in imagery will require coordinated multi-layer responses.
Nightshade stands out as a creative and technically impressive attempt to reclaim individual agency in a space dominated by large-scale AI systems. But for it to be truly effective, it likely needs to be part of a broader proactive and reactive legal, technical, and social strategy.
For artists to successfully protect their style and copyrighted material, reactive and proactive measures must work in tandem; no single layer is sufficient
Nightshade is just one part of a broader movement against AI generated images, but it takes an approach that turns the tables. Instead of being passively mined, artists can embed “landmines” that destabilise models built on non-consensual data. Its potency (requiring just dozens, not millions, of images per concept) proves that even billion-scale models are not immune to well-crafted sabotage. A sustainable future for generative art will hinge on clearer consent frameworks, robust authentication, and ongoing technical defences—ensuring that innovation grows with creators, not over them.
In the next instalment of Untrainable, we’ll dive into an experimental technique called Harmony Cloak, and see whether sonic poisoning can keep pace with rapidly evolving generative music models.
As #DataScienceActuaries, we’re always looking for another data set to wrangle into something fun using our unique blend of data and actuarial skills. If you have any interesting ideas and want to get involved, join the Data Science Actuaries page or reach out to any of our members.
Analysing the tools of resistance against AI-generated content (ironically, image was AI-generated via ChatGPT)
Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models https://arxiv.org/pdf/2310.13828
Organic or Diffused: Can We Distinguish Human Art from AI-generated Images? https://people.cs.uchicago.edu/~ravenben/publications/pdf/organic-ccs24.pdf