Like most people who are extremely online, Brazilian screenwriter Fernando Marés has been fascinated by the images generated by the artificial intelligence (AI) model DALL·E mini. Over the last few weeks, the AI system has become a viral sensation by creating images based on seemingly random and whimsical queries from users — such as “Lady Gaga as the Joker,” “Elon Musk being sued by a capybara,” and more.
Marés, a veteran hacktivist, began using DALL·E mini in early June. But instead of inputting text for a specific request, he tried something different: he left the field blank. Fascinated by the seemingly random results, Marés ran the blank search over and over. That’s when Marés noticed something odd: almost every time he ran a blank request, DALL·E mini generated portraits of brown-skinned women wearing saris, a type of attire common in South Asia.
Marés queried DALL·E mini thousands of times with the blank command input to figure out whether it was just a coincidence. Then, he invited his friends over to take turns on his computer to simultaneously generate images on five browser tabs. He said he continued on for nearly 10 hours without a break. He built a sprawling repository of over 5,000 unique images, and shared 1.4 GB of raw DALL·E mini data with Rest of World.
Most of those images contain pictures of brown-skinned women in saris. Why is DALL-E mini seemingly obsessed with this very specific type of image? According to AI researchers, the answer may have something to do with shoddy tagging and incomplete datasets.
DALL·E mini was developed by AI artist Boris Dayma and inspired by DALL·E 2, an OpenAI program that generates hyper-realistic art and images from a text input. From cats meditating, to robot dinosaurs fighting monster trucks in a colosseum, the pictures blew everyone’s minds, with some calling it a threat to human illustrators. Acknowledging the potential for misuse, OpenAI restricted access to its model only to a hand-picked set of 400 researchers.
Dayma was fascinated by the art produced by DALL·E 2 and “wanted to have an open-source version that can be accessed and improved by everyone,” he told Rest of World. So, he went ahead and created a stripped-down, open-source version of the model and called it DALL·E mini. He launched it in July 2021, and the model has been training and perfecting its outputs ever since.
DALL·E mini is now a viral internet phenomenon. The images it produces aren’t nearly as clear as those from DALL·E 2 and have notable distortion and blurring, but the system’s wild renderings— everything from the Demogorgon from Stranger Things holding a basketball to a public execution at Disney World — have given rise to an entire subculture, with subreddits and Twitter handles dedicated to curating its images. It has inspired a cartoon in the New Yorker magazine, and the Twitter handle Weird Dall-E Creations has over 730,000 followers. Dayma told Rest of World that the model generates about 5 million prompts a day, and is currently working to keep up with an extreme growth in user interest. (DALL.E mini has no relation to OpenAI, and, at OpenAI’s insistence, renamed its open-source model Craiyon as of June 20.)
Dayma admits he’s stumped by why the system generates images of brown-skinned women in saris for blank requests, but suspects that it has something to do with the program’s dataset. “It’s quite interesting and I’m not sure why it happens,” Dayma told Rest of World after reviewing the images. “It’s also possible that this type of image was highly represented in the dataset, maybe also with short captions,” Dayma told Rest of World. Rest of World also reached out to OpenAI, DALL·E 2’s creator, to see if they had any insight, but have yet to hear a response.
AI models like DALL-E mini learn to draw an image by parsing through millions of images from the internet with their associated captions. The DALL·E mini model was developed on three major datasets: Conceptual Captions dataset, which contains 3 million image and caption pairs; Conceptual 12M, which contains 12 million image and caption pairs, and The OpenAI’s corpus of about 15 million images. Dayma and DALL·E mini co-creator Pedro Cuenca noted that their model was also trained using unfiltered data on the internet, which opens it up for unknown and unexplainable biases in datasets that can trickle down to image generation models.
Dayma isn’t alone in suspecting the underlying dataset and training model. Seeking answers, Marés turned to the popular machine-learning discussion forum Hugging Face, where DALL·E mini is hosted. There, the computer science community weighed in, with some members repeatedly offering plausible explanations: the AI could have been trained on millions of images of people from South and Southeast Asia that are “unlabeled” in the training data corpus. Dayma disputes this theory, since he said no image from the dataset is without a caption.
Michael Cook, who currently researches the intersection of artificial intelligence, creativity, and game design at Queen Mary University in London, challenged the theory that the dataset included too many pictures of people from South Asia. “Typically machine-learning systems have the inverse problem — they actually don’t include enough photos of non-white people,” Cook said.
Cook has his own theory about DALL·E mini’s confounding results. “One thing that did occur to me while reading around is that a lot of these datasets strip out text that isn’t English, and they also strip out information about specific people i.e. proper names,” Cook said.
“What we might be seeing is a weird side effect of some of this filtering or pre-processing, where images of Indian women, for example, are less likely to get filtered by the ban list, or the text describing the images is removed and they’re added to the dataset with no labels attached.” For instance, if the captions were in Hindi or another language, it’s possible that text might get muddled in processing the data, resulting in the image having no caption. “I can’t say that for sure — it’s just a theory that occurred to me while exploring the data.”
Biases in AI systems are universal, and even well-funded Big Tech initiatives such as Microsoft’s chatbot Tay and Amazon’s AI recruiting tool have succumbed to the problem. In fact, Google’s text-to-image generation model, Imagen, and OpenAI’s DALL.E 2 explicitly disclose that their models have the potential to recreate harmful biases and stereotypes, as does DALL.E mini.
Cook has been a vocal critic of what he sees as the growing callousness and rote disclosures that shrug off biases as an inevitable part of emerging AI models. He told Rest of World that while it’s commendable that a new piece of technology is allowing people to have a lot of fun, “I think there are serious cultural issues, and social issues, with this technology that we don’t really appreciate.”
Dayma, creator of DALL·E mini, concedes that the model is still a work in progress, and the extent of its biases are yet to be fully documented. “The model has raised much more interest than I expected,” Dayma told Rest of World. He wants the model to remain open-source so that his team can study its limitations and biases faster. “I think it’s interesting for the public to be aware of what is possible so they can develop a critical mind towards the media they receive as images, to the same extent as media received as news articles.”
Meanwhile, the mystery continues to remain unanswered. “I’m learning a lot just by seeing how people use the model,” Dayma told Rest of World. “When it’s empty, it’s a gray area, so [I] still need to research in more detail.”
Marés said it’s important for people to learn about the possible harms of seemingly fun AI systems like DALL-E mini. The fact that even Dayma is unable to discern why the system spits out these images reinforces his concerns. “That’s what the press and critics have [been] saying for years: That these things are unpredictable and they can’t control it.”