When last we checked in on the state of realistic sounding AI generated human voices, The technology was getting better but still had a way to go before it was going to fool most people. But sitting here today, having just listened to this fake Joe Rogan, I feel very confident saying that my god, we’ve reached holy shit I would totally fall for that territory.
I’m sure somewhere there’s a diehard Joe Rogan fan screaming at me through his monitor saying come on, man! But no. You come on, man! It might not be 100 percent perfect, but as far as the general population is going to be concerned, goddammit, that’s Joe Rogan talking.
Here’s the voice of actual Joe by way of a random episode of his podcast for comparison.
https://www.youtube.com/watch?v=5ihph3nrgt0
Seriously, don’t even try to tell me that the two aren’t scarily close, at least not until you try this real vs. fake quiz, creatively named Faux Rogan. Somehow I managed to pass it with 5 out of 8 correct, but it was far from easy. I’m not sure that if I took it again after some time had passed that I would get a similar result, honestly. They really do sound damn near identical.
So where did this come from?
It’s the work of a company called Dessa and the RealTalk deep learning model its engineers have developed. That link will take you to a fairly technical explanation of how they did all of this, if you’re interested.
Importantly, the company seems to be aware of what they’ve done here and the implications of it, so much so that they have no plans to release any of their actual research or datasets to the public. That’s nice, even though it’s surely only a matter of time before some other company does.
When I get really scared is when they manage to get the fake voice to go mwa like in the real one hahaha. But seriously. That’s freaking freaky.
No kidding. It’s almost freaky enough that I’m starting not to care about the good uses it could have.
Have you tried the quiz?
I got 7 out of 8. And I almost got the one I got wrong correct. But I heard a breath and fell for it.
Colour me impressed and a little surprised.
I tried to go on straight inflection or lack thereof. But probably if I had more clips I would get screwed for sure.