Virtual YouTuber Kizuna AI is the featured guest on Downtown DX, a decades-old prime-time variety show hosted by two of Japan’s most famous veteran comedians, Hitoshi Matsumoto and Masatoshi Hamada. Both pushing 60, the boomers joke about not knowing who or what she is, feigning ignorance. “What is this?” they ask, peering at the kinetic anime graphic, its eyelashes fluttering. “Who’s in there?”
But AI gets the last laugh. “Do you even know about us?” Matsumoto asks her, or the screen on the floor of the soundstage where she is manifesting today. “Of course I do!” AI replies in her squeaky chirp, spreading her arms, brushing away the question. “I know everyone here.” Then she beats one of the hosts at an impromptu game of rock-paper-scissors. “Do something now!” she shrieks at him after she wins. “Make some kind of reaction.” She issues commands: “Look that way! Look up! Look down!” giggling hysterically as the human Matsumoto obeys like a robot.
In contrast to her paunchy, middle-aged hosts, AI’s physique is archetypal shōjo (young female) manga: long legs and hair, short skirt, with a prim schoolgirl bow across her chest and a pink hairband with heart-shaped bunny-ear patterns on her head. Her gestures, expressions, and energy are otherworldly, less realistic than hyperreal, like watching a Technicolor character from a cartoon dancing around a black-and-white room. When she’s bantering with human celebrities and hosts on talk shows, AI’s glowing, unblemished presence makes her the most charismatic star on the screen. Everyone else looks drab, no matter how famous they are.
She was made to be so. Her full name puns on ai, the Japanese word for “love,” combined with the acronym for artificial intelligence and roughly translates as “bonds of love.” A single actor provides her voice, but her movements and reactions are computer generated, with her comments chirped by the performer in real time. Her precision-point kawaii looks and air-brushed, super-cute sexiness were crafted by illustrator En Morikura.
Launched in 2016, Kizuna AI was the first genuine “VTuber,” an entirely digital YouTube celebrity — not an avatar for a real person but wholly animated and autonomous. “Yes, there’s an actor rumored to provide her voice, named Nozomi Kasuga,” her creator, Takeshi Osaka, CEO and founder of production company Active8 Inc., says. “But no one really cares or wants to know. AI’s like Mickey Mouse. Who cares who does the voice?”
On June 30, AI and the VTuber phenomenon she spawned turned five years old. But over those years, she’s become a trans-media phenomenon, an AI character who is functionally interactive in a variety of real and virtual settings. This year, she modeled a collection for the fashion company Valentino, graced a recent cover of Japan’s glossy Commons&sense magazine. She’s shilled for telecoms giant SoftBank and the Japan National Tourism Organization, among others. She’s performed two live concerts on YouTube, TikTok, China’s Bilibili, and VR platform Oculus Venues. For her birthday celebration, she performed for the first time with a live human band at Tokyo’s Zepp Haneda music hall. Front-row tickets cost $150 each.
Before the coronavirus pandemic forced the world into internet isolation in 2020, VTubing was a niche medium, largely confined to Japan’s overactive subculture of fanboys and otaku. The pandemic’s disruptions to everyday lives, and to the entertainment industry, have broadened VTubing’s appeal. Last August, Playboard, the YouTube financial stats aggregator, reported that seven of the world’s 10 top Super Chat earners were Japanese VTubers. (YouTube launched Super Chat in 2017, allowing users to pay to have their comments highlighted and pinned to the top of live chats during streams.) The comically endowed, auburn-haired and horn-eared dragon avatar Kiryu Coco led the way at over $800,000 total, drawing more than $100,000 in a single week. In a year without concerts, conventions, or even café meetups, fans worldwide flocked to virtual entertainment. VTubers, far less virus prone than old-school flesh-and-blood YouTubers, surged in popularity. According to YouTube’s Culture and Trends report, by October 2020, VTubers’ views grew to more than 1.5 billion per month.
“The pandemic created a new digital ecosystem of livestreaming homebodies,” says 28-year-old Meilyne Tran, a VTuber fan and COO of GeeXPlus, a Tokyo-based anime YouTuber management company, a subsidiary of the Japanese publisher and producer Kadokawa Corp. “The attraction to VTubers is because they’re not human, not in spite of it,” she says. “People of my generation don’t like humans so much nowadays. We’re plugged in all the time and prefer interacting online, and VTubers keep us in the fantasy universe.”
Ivan — who asked Rest of World not to use his full name, to protect his privacy — was born in Bulgaria but moved to his mother’s native Japan a little over three years ago. He used to follow a handful of YouTubers and Twitch influencers and gamers, but sometime during the pandemic he switched to virtual entertainment, after he got sick of “thirst traps” and “e-thots” (electronic that ho over theres) — people who post photos or host video streams that lure viewers in with their bodies, only to take their money and rebuff them when they try to build a relationship.
“Real women with hot bodies are always showing off, getting naked in a bathtub or little swimming pool, trying to get you to lust after them,” he explains. “They don’t really care about you. They just want your money. A lot of people have gotten their hearts broken by 3D women on streaming, but with a 2D character, she can’t really break your heart. You don’t really know what she looks like, you just see this cute anime girl with this really cute voice.”
After attending two years of study at a college in Japan, Ivan lucked into a job as a part-time staffer at a Japanese VTuber agency and says he’s so busy producing and watching VTuber livestreams, he no longer has time for a real relationship, let alone watching actual scripted anime series. “I guess it’s like containing your dream, to a certain extent, to that medium and getting the most out of it without getting your hopes betrayed.”
Most VTubers are an extension of Japan’s idol culture — young female entertainers groomed by agencies to embody a preset ideal of “purity” through cute good looks, an earnest devotion to practice, training and diligence, and respect for one’s elders. At the peak of the craze in the 2010s, an ensemble idol group, AKB48 — a rotating roster of amateur singers and dancers — sold more singles than any Japanese music act ever and sparked spinoff groups across the country and in six other Asian nations.
The soft-sell sex appeal of both entertainment models is rooted in moe, a vague Japanese term that, in the pop culture idiom, refers to an attraction to physical beings that exist beyond the bounds of reality and often implies the allure of unsullied youth. That the objects of affection in manga and anime sometimes border on the childlike can make them uncomfortable viewing.
But the girl-next-door accessibility of idol culture goes only so far, and for only so long. Idols are human beings, young females identifiable on the street and vulnerable to stalkers and online doxxers. They are also heavily monitored, with their personal lives micromanaged into oblivion. Talking about sex is verboten; having sex or even a boyfriend is a deal breaker. Plus, they age. Older models are cast aside and replaced, and very few find a stable future after being jettisoned by the mother ship of management.
For agencies, VTubers offer a solution to mortality and the ethics of corporate control. VTube avatars are perpetually youthful and cute. Their sex appeal leans more toward the adorable — with lots of tilted heads, soft giggles, and quizzically arced eyebrows — than the anthropomorphic: no avatars getting naked in kiddie pools. Male viewers can absolve themselves of guilt (it’s just a cartoon!) while tossing money at their 2D dream date, cheering her on and taking paternal, patronizing care of her, mansplaining the real 3D world.
The fact that VTubers have been able to make these dreams feel more real than real life and create more intense, more realistic interactions is partly down to the technology. Better, cheaper tech has made VTubing simultaneously more attractive to watch and more affordable to perform. Only three years ago, Hollywood-quality motion-capture software costing up to $30,000 or more was required to produce even passable animated movements. But today, free downloadable software called Live2D Cubism, originally created by Japanese programmer Tetsuya Nakajo, means that anyone with a decent avatar illustration can become a VTuber for less than $100.
“Part of the explosion of anime VTubing can be traced to technology finally evolving to where the suspension of disbelief is strong enough for viewers to feel like they’re interacting with a ‘real’ anime character,” says Justin Ignacio, a member of Twitch’s founding team and co-founder of VShojo, a U.S.-based VTuber talent agency that launched in November 2020.
The commodification of the technology has met a rising wave of interest in Japanese cultural exports, particularly in anime, which was already becoming a mainstream medium worldwide before the pandemic. A subsequent global search for new content by audiences and streaming platforms drove a fresh swell of interest.
Over the past year and a half, anime and manga have seen record-breaking profits at home and overseas. The 2020 blockbuster anime film Demon Slayer: Mugen Train, Japan’s highest grossing movie ever, became the top foreign-language earner in North America when it premiered there earlier this year, while the explosive proliferation of anime titles on global streaming platforms like Netflix, Hulu, and Amazon Prime has helped lift U.S. manga print sales by 3.6 million units in the first quarter of 2021 alone (compared to similar sales in the first quarter of 2020).
VTubing has tracked this phenomenon. VTuber channels exist for targeted markets in multiple languages, expanding heavily into regions of Southeast Asia and Latin America. It has been estimated that the number of VTuber performers now exceeds 10,000. In late January, the world’s most popular YouTuber, the controversial Felix Kjellberg, popularly known as PewDiePie, donned an anime VTuber avatar face to acknowledge and poke competitive fun at the rising popularity of his virtual peers. At the end of April, Netflix debuted a half-sheep, half-human female VTuber named N-ko Mei Kurono to “personally and directly be a part of the anime fan community,” according to Rob Pereyda, the company’s head of anime, editorial, and publishing. In June, gaming giant Sega celebrated the 30th anniversary of its iconic bright-blue hedgehog, Sonic, by appearing as a VTuber, with the character voiced by veteran Sonic actor Jun’ichi Kanemaru. In July, Sony Japan opened auditions for what it calls “the largest virtual talent development and management project in history,” with plans to debut 50 VTuber talents.
VTubing’s globalization follows a well-traveled path. Japan particularly excels at creating cultural products that are accessible and interactive, their grassroots appeal spreading first among fans, then rising from the bottom up. Iconic Japanese characters with their relatively simple 2D designs (think Hello Kitty and Pikachu) can be embraced as race-less, stateless, and non-gender-specific — qualities that have become even more prized in a world that is increasingly fractured and contentious, especially online. Once their dialogue is localized through voice dubs or subtitles, they cross national borders with ease, making them ideal entertainment emissaries for the streaming age.
Their simplicity also enables the viewer to project ethnic, emotional, and more elastic racial and gender identities onto their designs. VTubers take this aesthetic interactivity a critical step closer, into the present tense and privacy of the viewer’s personal screen. You can now interact with an anime character in real time, tapping into a childhood fantasy that almost everyone has: Who doesn’t want to talk to a cartoon?
That isn’t to say that VTubers haven’t had their controversies, as the studied innocence of the form meets the reality of globalization. In late 2020, two pioneering bilingual avatars managed by the Japanese agency Hololive, Coco and Akai Haato, were suspended for three weeks by their parent company after they acknowledged supporters from Taiwan and showed the Taiwanese national flag, sparking howls of protest from Chinese nationalists on Bilibili. Producers told Rest of World that the scandal caused Chinese game companies — a major source of advertising revenue for Japanese VTubers — to pull their sponsorship.
VTubers’ profits from Super Chat have skyrocketed. The most popular characters have made themselves and their management agencies very rich very fast. Coco’s channel alone has racked up over $2 million on Super Chat in a year and a half, according to Playboard. Like so many Japanese pop culture business models, VTubing has the potential for even greater profit margins. Similar to anime and manga, the overhead is low: the tech is cheap, and with few exceptions, the individual performers work mostly out of their own homes, making it the perfect pandemic production model. YouTube and the agencies each take a slice of the Super Chat cash pie, though neither will disclose percentages.
The potential for spinoffs, trans-media permutations, and especially merchandizing is massive and enticing. With 2D and 3D animation technologies advancing and more seamlessly meshing with live-action and CG settings, popular VTubers can pop up nearly anywhere — in video games, live-action movies and TV series, video games, comic conventions, and other pop culture fan gatherings worldwide. They are already collaborating across company lines, with VShojo characters chatting with flesh-and-blood YouTubers, and VTubers appearing as guests in one another’s livestreams. Anime characters from popular films or series can host their own channels and let audiences into their expansive worlds. And VTuber avatars can be remade and redrawn to suit circumstances, traversing borders of time and space with no travel costs and very little upkeep.
But, while agencies and sponsors can impose their corporate will over what their characters say and do, they are finding that the fandoms that have developed around VTubers are not easy to control.
At an outdoor café on Shinjuku Southern Terrace, an elevated shopping boulevard above Tokyo’s busiest railway station, 26-year-old Kazuki Takahashi bends over his laptop. The sun is so bright, he has to shield his eyes, and, in the background, trains rattle across the tracks. But he isn’t really in Tokyo, he’s there on the screen, playing Resident Evil alongside Gawr Gura.
In late June, Gawr Gura became the most popular VTuber in the world, beating Kizuna AI’s subscriber count by surpassing the 3 million mark. Her name is Japanese onomatopoeia for a gurgling shark stomach: がうるぐら. She appears in the form of a diminutive white-haired girl with saucer eyes, in an oversized blue shark cosplay outfit that looks like something she sewed in her basement over a weekend, sporting a hoodie-like hat with pointy shark-fin-shaped ears that snaps loosely beneath her chin, with a long fishtail sprouting from her backside — artful amateurism being a big part of the VTuber culture.
As Kazuki leans into his screen in Shinjuku, Gura has been streaming for more than four hours, not only playing Resident Evil — a digital avatar inhabiting a digital avatar — but moving through the scales of reality, lusting after one of the characters in the game. “Leon, look at me,” Gura says blandly. Then in a huskier voice: “Look at me!” Kazuki nods approvingly before quietly saying: “Leon is so hot.”
Early on, there were rumors that Gura wasn’t Gura, she was an American YouTuber named Senzawa. Fans speculated on the similarities of the tones in her voice, which is distinctive and mesmerizing: the soft drawly coo of a shy but precocious woman-child, whose most frequent laments are “I’m tired. Sorry, guys.” and “I’m hungry.” When the voice suddenly bursts into song, it soars. But the reaction from fans when anyone online mentions the similarities shows how protective the fandom is of its stars. “Treat the other past as their ‘private life,’” sums up one of the more patient Redditors. “If you know, then just keep it to yourself,” another writes. “Jesus Christ. They became a Vtuber for privacy’s sake. It’s like an author assuming a pseudonym. Just let sleeping dogs lie.”
This is the irony of the VTuber universe: both sides, performers and fans, protect their anonymity fiercely behind avatars and live-chat aliases. Yet the performers are prized for their individualistic quirks, lovable personalities, and talents. The goal is to be unique, and therefore precious, without revealing your true self or risking reality.
The demands of the VTuber fanbase, whose affection for the performers is real, have complicated agencies’ desires to create perfect, risk-free personas.
Two years ago, when Kizuna AI’s producers attempted to replicate her avatar using different voice actors, fans, particularly from China, rebelled and went ballistic on Bilibili, Reddit, and other forums, demanding that the original actress, Kasuga, be respected and protected. (Some industry observers claim that AI’s brand still hasn’t fully recovered.) Charges of profiteering ran rampant. “Being virtual doesn’t mean being replaceable,” one AI fan wrote on Reddit.
This resentment of corporate control hit home at Hololive in June, when the agency’s first bilingual superstar, Coco, suddenly announced via a halting and sometimes painfully awkward nearly 12-minute video that she was leaving the company for good. In practice, it’s only the actor who voices Coco who’s leaving — the company could simply find another voice. But that seems unlikely — her fans wouldn’t accept someone else.
Her usually blissed-out voice suddenly shaky and strained, pausing for composure as she bows apologetically, Coco reads, in English and Japanese, a single line of text announcing that she will “graduate” (the conventional idol-world euphemism for retire or quit) from Hololive on July 1. “I know that’s hard to accept,” she says, drawing a breath before asking viewers not to ask other Hololive VTubers about her decision and to remember to unsubscribe from her channel — since YouTube will continue to take their money.
The live chat starts scrolling, and it’s hard to decide which is more heartbreaking: Coco’s raspy staccato delivery or the multilingual rapid-fire blurts of desperation: “Que dice?” “I can’t read this!” “PANIK” — all typed in a frenzy, as if someone in the family had died. The video now has over 2.2 million views.
Coco’s departure last month jolted the industry in Japan, forcing agencies to confront a growing fear that one defection could lead to an exodus of VTuber talents who’ve learned that they no longer need management. Days after Coco quit, two other popular Hololive avatars, pioneering bilingual partner Haato and Japanese-language VTuber Murasaki Shion, took what they called “hiatuses” for indefinite periods and have yet to return. Three weeks later, the performer for avatar Lulu Suzuhara, managed by Japan’ s other big VTuber talent agency, Nijisanji, thanked her fans in a two-hour livestream with 2.3 million views before shutting down her channel. (“If this keeps happening,” one Hololive manager told me, requesting anonymity, “we may not be here next year.”) The direct-to-consumer exhilaration of livestreaming may make corporate control not merely annoying, an intrusive reminder of manufactured artifice, but irrelevant. While VTuber avatars are relatively easy and inexpensive to create and replicate, their very human talents have become irreplaceable to legions of loyal and adoring fans.