Is NVIDIA’s AI-driven NPC tech just really expensive improv?

January 23, 2024

1 View 0

SaveSavedRemoved 0

I’d be lying if I said NVIDIA and Convai’s pitch for AI-driven NPCs at the 2024 Consumer Electronics Show didn’t impress me just a little bit. Reading about The Verge senior editor Sean Hollister’s experience with the demo, I got the sense there was something technically impressive under the hood: video game AI that can understand player voice inputs and spit out reasonably responsive dialogue isn’t something to sneeze at.

Sure, I rolled my eyes at Hollister’s description that this technology is “inevitable.” NVIDIA’s showcase didn’t even impress me as much as the tech demos we see for tools like Unreal Engine 5, which are meant to show off all the highlights of the technology and obfuscate any limitations that developers will encounter in the field. The fact that it’s set in the same Blade Runner-inspired ramen shop that we saw in its first reveal of the technology also has me raising my eyebrows—surely a year later the company would want to show how this tech works in more expansive environments?

It doesn’t help that according to Hollister, the supposedly lifelike NPCs could only stumble through some pretty trite descriptions of the game world and regularly blanked out in response to simple questions (that’s pretty relatable though).

That makes for another interesting comparison with Unreal Engine’s “Valley of the Ancient” demo—that video showcases an eye-catching game world and mechanics that feel right at home in a vertical slice of a triple-A game. The cyberpunk ramen shop conversation doesn’t come close to that experience.

Tech finagling aside, it finally dawned on me today why NVIDIA’s AI NPC pitch falls flat for me: at the end of the day, the prestige GPU company has poured a ton of resources with the final goal of…replicating the art of improv.

That means if anyone’s looking to beat NVIDIA at their own game, they should spend a bit less time watching Blade Runner and more time at their local comedy club watching groups probably named something like “Chewbacca’s Illegal Poker Night.” And don’t take my word for it—the world of video games and live role play are already playing in this space.

Marvel’s Spider-Man 2 used improv to make funny NPCs

Remember back when Marvel’s Spider-Man 2 launched, and players took a pause on their web-slinging to listen to the absolutely bananas conversations some street-level NPCs were having? In case you missed this, here’s a sample of how off-the-wall they got:

Yes there is an NPC who suggests putting petroleum jelly on a baby.

In November Wired senior writer Megan Farokhmanesh got the scoop on how Insomniac Games cooked up these goofy conversations: it was just a bit of improv comedy from two talented performers. “It feels both hilariously deranged for a video game and somehow completely organic—dialog [sic] that believably could have been faked by a content creator with a dark sense of humor,” she accurately wrote.

Voice actor Krizia Bajos explained to Farokhmanesh that she and scene partner G. K. Bowes were given free reign by dialogue director Patrick Michalak to unleash their improv talents in an “atmosphere session” to fill background audio for the game world. The pair only did a few takes of this line before moving on to other conversations.

Let’s do some quick (and unfair) cost comparisons to NVIDIA’s AI tool. NVIDIA’s tool would require licensing it from the company and fiddling with the back-end generators to generate characters who could babble about something as ridiculous as poorly-conceived childcare.

Said content generation would require multiple runs of GPU-intensive computing that could run up power costs for someone somewhere in the technology’s pipeline. It would probably take a few hours of trial and error (plus some human-involved supervision and course correction) to dial in something like this.

Meanwhile I’d estimate Bajos and Bowes knocked out that line in about an hour, freeing up the rest of their time to riff on lines for other NPCs. All it takes to power them is a fair day rate and a nice catered lunch.

NVIDIA might protest and say their tool provides animations and character generation for developers. Fair enough! But do you really need to render every NPC in a game with realistic precision to make players stop everything and listen to what they have to say? I love Marvel’s Spider-Man 2 but those street level New Yorkers have the same dead eyes as the action figures sitting on my shelf. The spontaneous conversation is more than enough to spark delight from players.

Before I move on I have to make one concession to the AI technology boosters: writers and performers can absolutely use generative AI to create stupid, laugh-out-loud conversations. Comedy duo Dudesy use an “AI” of the same name to whip up dumb prompts for them to riff on during a podcast and post video clips of their robotic improv partner generating outlandish clips of of bad celebrity recreations. Until they flew too close to the sun and released an AI-generated George Carlin special (and did not consult his family), I admired how they took advantage of the uncanny valley of generated AI to do gutbusting comedy.

But now that they’ve dragged one of the world’s most beloved comedians out of his grave, I’m less impressed. And far more eager to explain how Disneyland’s Star Wars-themed, live-action-roleplaying-adjacent attraction Galaxy’s Edge has more improv lessons for game developers than NVIDIA.

Disney trusts real actors to bring Star Wars to life at Galaxy’s Edge

Marvel’s Spider-Man 2‘s improv NPCs are great, but they are only talking to each other, not the player.

So let’s hop on an I-TS Transport to the planet of Batuu, brought to life at Disneyland and Disney World in a section of the parks called Galaxy’s Edge. Disney collaborated with veteran live action roleplay developers to bring to life not just a park filled with immersive rides and attractions, but also to populate the space with performers who would always be in-character around park guests.

In a pre-pandemic 2020 visit, two particular performers charmed the heck out of me while I’d wandered around what’s known in-universe as Black Spire Outpost. The first was Kylo Ren and a pair of Stormtrooper cronies, and the second was Vi Moradi, a character first seen in two (incredible) books penned by Delilah S. Dawson.

In my first encounter, Kylo Ren (who I hear is shredded) spotted the Rebel Alliance logo I was sporting on my jacket. He cut through the crowd to tower over me and try to threaten me for sporting traitorous paraphernalia.

Video by Carli Velocci.

As you can see in the video above I tried my best to riff with him but was unprepared (and it was hard to get into character when I was also as excited as a little kid. I’m not a professional). But no riffing was needed. Ren verbally roasted me for a few minutes while his entourage leaned in to quip like a pair of hype men. I don’t know if I felt immersed per se since an encounter like that in the films would usually have ended with Ren slicing my body in half, but I was still over the moon about it.

There was something validating about it all, like the world recognized me and the choices I’d made in my wardrobe. The improv here was in spotting an attendee that was ready to play ball, and tailoring their physical performances to match the space of the scene.

Here’s the catch: Ren and the troopers weren’t actors hip-firing lines custom-tailored for me. Those performers purportedly strut around the park with a host of pre-programmed voice performances that trigger with hand and finger gestures. Only a repeat park attendee would realize they’re ready to razz folks in Rebellion or Resistance gear on a regular basis.

My second encounter with Moradi was more subdued. She’s not a well-known character with a film-inspired performance that she needs to adhere to. In the park she’s mainly focused on acting like a way-too-obvious spy and leading attendees (usually children) on little missions.

But I’d read Dawson’s books and knew Moradi partnered with a First Order turncoat named Captain Cardinal to establish a Resistance base on Batuu. I tried asking her about Cardinal’s well-being to see if Disney Imagineering had prepared something special for guests who’d read the ancillary material.

“That’s classified,” she responded curtly, deliberately avoiding eye contact with me. I wished her well on her mission and scuttled away, a little embarrassed about the moment.

Here’s where this gets interesting from a design perspective. At a surface level this was a bad encounter, because the park guest didn’t leave feeling delighted and immersed by the world. But dig deeper, and you’ll see the strength of the actor’s choice. It’s an in-character response because after all, Moradi is a spy and does not run around telling civilians about her close compatriots.

Art of Vi Moradi hiding from First Order troopers.

Art of Vi Moradi. Image via Disney.

It also sets boundaries. Moradi and other unmasked park performers have the heavy task of encountering thousands of people every day, plenty who will try to explore how much control they can exert over the characters. For the actor’s own well-being and for the benefit of other guests, she has to establish there are prompts she won’t respond to. It’s not a good thing if I hog up her time talking about a book five people in the park have read, and not giving her a chance to charm the regular fans.

Improv nerds might say this violates the “yes and” rule of improvised performance. I’d call it an advanced technique: it’s an in-character way to shape what her character will do and keep the overall event on track and perfect for attendees. Game writers reading this right now are hopefully nodding along, knowing how they have to put guardrails on game characters to keep focused on the task at hand.

Both these encounters are what is notably absent from NVIDIA’s generative AI NPC demo. The NPCs are reactive to the player questions, not proactive and prepared to safely antagonize a player. They’re also completely deferential to the players’ demands. I’m sure NVIDIA’s designers would love to demonstrate how their technology can do that, but the public pitch is that they won’t.

These characters are meant to serve the whims of a player who wants to be the star of the show, cut from the same cloth as the robots of the HBO series Westworld to be respond to whatever the human desires.

I’m confident some AI toolmakers can one day cook up tools that improve the NPC experience, but this ain’t it. Auto-responding NPCs aren’t inevitable, they’re a flash in the pan that abandons the human performance that inspired them in the first place.