2 min read

Name it Habsburg AI or MAD: when AI feeds on AI-generated data it starts to get... weird

Updated: Sep 9, 2023

Whether because it is cheaper or more abundant for use, AI has been increasingly started to be fed with not just human-made data (natural/real data) but also loads of AI-generated data (synthetic data). Some companies are doing this willingly to combat inadequacy of real data and but some are doing it unintentionally without being aware that the datasets they use on their AI contains synthetic data.

However, when AI starts to get fed by AI-generated data, things start to get weird. A data researcher from Monash University named Jathan Sadowski has referred to this phenomena as Habsburg AI- "a system that is so heavily trained on the outputs of other generative AI's that it becomes an inbred mutant, likely with exaggerated, grotesque features.". Some uses the analogy to Mad Cow Disease which means feeding cows to younger cows in a repeated cycle which results in brain-destroying pathogens. Therefore, it is accurate to say that when AI is trained on synthetic data repeatedly, the AI might go MAD, resulting in potential disruptions.

According to PhD students from Rice University who have collaborated with researchers from Stanford and who have coined the concept of "MAD" for AI, this requires caution from both AI companies and users. When AI is trained by synthetic data repeadetly, it could learn to focus on just certain aspects of the data that may be overrepresented or biased. Therefore, these undesirable patterns (so-called artifacts) may be amplified which would "start to drift away from reality". For example, such an AI intended to generate images can produce glitches and blurs. The PhD students emphasize that companies should be aware that the AI output they produce might contain such amplified artifacts which could be used for future training of generative models.

It is alarming that even people are paid for the job of labelling data for an accurate supervised learning of AI let alone AI companies have started to increasingly consume synthetic data to AI. Those people, called Mechanical Turks, are remotely located crowd-workers Amazon has hired for this job purpose but who have started to use ChatGPT to complete the work. This means an increasing set of "loops upon loops" and efforts upon efforts to untangle the loops before it becomes too late.

Resources:

https://futurism.com/ai-trained-ai-generated-data-interview-

https://www.newscientist.com/article/2382519-ais-trained-on-ai-generated-images-produce-glitches-and-blurs/

Written by: İdil Ada Aydos

Name it Habsburg AI or MAD: when AI feeds on AI-generated data it starts to get... weird

Recent Posts

Commenti