Duplex, Google’s Scarily Human AI Voice
BY KATHARINE SCHWAB at Fast Company's Co.Design
It was 11 a.m. I was in a Thai restaurant on the Upper East Side of Manhattan taking reservations. And I was about to talk to a robot.
The phone rang. “Hello, THEP Thai,” I answered.
“Hi, I’m calling to make a reservation. I’m Google’s automated booking service so I’ll record this call.”
“I’m sorry–what is this?”
The robot on the other end of the line went on to explain–in a lilting voice that reminded me of the Valley girls of my youth–that it was an automated system calling on behalf of a client who wished to make a reservation. (I knew this already; I was at THEP Thai to demo Duplex, Google’s new AI calling technology.) But fair enough, I thought. I hate calling restaurants, too. After some haggling over timing, the voice agreed to a 7:30 p.m. reservation. “How many people?” I asked. “Five.” I suddenly realized we wouldn’t have space for five until later in the evening. Could they do 8 p.m. instead?
“I thought we agreed on 7:30 p.m.,” the voice said with a hint of sass that made me instantly apologetic and a little squeamish. Could a machine really shame me that easily? After a quick sorry for assuming that the voice wanted a table for two, we were able to agree: 8 p.m., for five people. Phew.
If this conversation sounds mundane, it’s because it is–thousands like it happen every day in restaurants across the world. But it was also shocking in its normalcy, because the voice, powered by Duplex, sounded unlike any machine I’d ever heard. With its SoCal intonation, pauses, and “ums” peppered throughout the conversation, it sounded uncannily human. By the end of the conversation, I’d almost forgotten that I was speaking with a powerful piece of machine learning software.
Duplex, which Google announced at its developer conference I/O earlier this year, was created to solve a truly first-world problem: 60% of businesses that rely on bookings (like restaurants and hair salons) don’t actually have online booking services, according to a Google survey. And in a text-heavy, digital world, people hate picking up the phone to make a reservation. I’m certainly guilty of that–sometimes I’ll decide to go to a restaurant that has online reservations just because it’s so much easier than calling. What if I could just ask Google Assistant to do all that hard work for me?
That’s the premise behind Duplex, which the company will start rolling out later this summer. All you do is ask your Google Assistant to book you a reservation at a certain restaurant for a certain time for X number of people, and Duplex calls on your behalf and adds it to your calendar. For the users, it’s an obvious win–you outsource the boring task. Google believes Duplex will be good for businesses, helping them get reservations from lazy people like me who would prefer not to call, though testing will tell whether the service is truly beneficial for restaurant owners and their employees. The costs for the low-paid service workers who have to deal with Duplex are less clear.
When Google first unveiled Duplex to the world, it neglected to include a disclosure that Duplex was a robot at the beginning of the conversation, inciting a debate over the ethics of not informing people that they’re chatting with an AI. Nor was the demo at I/O live, leading some to speculate that the entire thing was faked. People wondered why you’d need to make an AI sound so human–we’re all trying to say “um” less, so why would you include that in a robot voice? And more seriously–what does it mean if Google has created an AI that could feasibly deceive people? In other words, Duplex brought up a lot of questions about whether Google is developing its AI responsibly.
According to Nick Fox, the VP of product and design for the Google Assistant, the team had talked about how to disclose that it was an AI talking from the very beginning–but just hadn’t included it in the demo because that was supposed to showcase the tech alone, not the product in its final polished form. I’m skeptical as to why the team wouldn’t include such a crucial piece of the user experience when presenting Duplex to the world, but Fox also says he has listened to the response from I/O and incorporated some of the feedback into the demo that I tried at the restaurant. “With some of these things it’s important for us to have a point of view, but this is how technology interacts with society, so it’s important that we don’t define all of that in a vacuum but also to get feedback from people outside Google about how these things should work,” he says.
The company is rolling out Duplex very slowly, starting with simply calling to find out if a business is open in the next few weeks, which will be limited to a small group of businesses and users. Then, later in the summer, the company will test out restaurant reservations and hair appointments, again with a small set of businesses and users. There’s no timeline yet for when it’ll be pushed out more broadly because the company is still determining whether or not it’s truly beneficial for businesses; but if and when that does happen, restaurants will be able to opt-out of the service if they’d prefer not to force their employees to talk to Google’s robot. And for employees who surprised and uncomfortable when they encounter the bot on the phone, they always have the old-school option: just hang up. (Of course, talking to Google’s robot might be infinitely preferable to talk to the errant human who happens to be an asshole.)
It’s still unclear whether Duplex will succeed or not during these initial tests. But even if booking appointments over the phone isn’t quite the right application for Duplex, the company plans to use the technology in other ways.
THE MAKING OF DUPLEX (AND ITS ALL-IMPORTANT “UM”)
Duplex is the result of a series of different Google technologies: speech recognition, dialogue, and creating a natural-sounding voice. The initial prototype was hacked together in only a few months, with a team of engineers literally placing a landline phone on top of a laptop’s speakers to make the first awkward call. The Google team played that initial conversation at the demo–and it was so uncomfortable to listen to that it was almost funny. The early Duplex couldn’t understand even the most basic of questions, and when the woman paused to check the reservations book, the system freaked out and queried her again with a robotic, “Hello?”
Duplex’s voice, back then, was so machine-like that the human workers on the other end of the line would frequently get frustrated and hang up. Making the voice sound more natural was imperative to make the product work. To improve the conversation, Google hired a host of human operators–people who would call restaurants to make reservations and then annotate the phone call recordings by indicating granular-level details about the conversations, for instance which statements are questions about the number of people and which are indications that the bot needs to wait for the person on the other end to check their system. The engineers then used these annotated recordings to teach the machine learning algorithms underlying the Duplex system how to understand simple statements and to infer meaning through context. Once Duplex got good enough to call on its own, the human operators would still listen in, guiding the conversation if it went off the rails or taking over completely. “As we made it feel more natural, the success rate of actually making that appointment, getting the user the appointment they want and the business getting the business they want, went up,” says Scott Huffman, the VP of engineering for the Google Assistant.
One of the keys to making Duplex sound more like a human turned out to be adding “ums”–called speech disfluencies by linguists–to its speech. “What linguists have found is that speech disfluencies actually play a key role in human conversation that keeps the conversation going,” Huffman says. In other words, saying “um” or “mhmm” isn’t just filler–it also plays a crucial role in acknowledging, for instance, that the listener has understood the speaker, or politely indicating that there’s some confusion. It turns out sounding human isn’t just window dressing: It’s integral to the technology working at all. And as of now, the automated system can handle four out of every five calls completely on its own. When it runs into a problem, the call gets bounced to a human operator who takes over.
As the Google Assistant team focuses on testing Duplex this summer, there’s a bigger goal on the horizon: bringing the tech’s ums to the Assistant itself. Fox tells me that some testers have asked for Duplex’s voice on their phone or Google Home, because it’s so much more natural. If everything goes well with the launch this summer, that’s a very real possibility. “One thing we’re really excited about is bringing some of these elements to the Google Assistant itself,” Huffman explains. “Millions of people every day have conversations with Google Assistant on their phones, or through Google Home, and we’re looking forward to making those conversations feel more natural.”
Talking to Duplex may feel wildly futuristic, but it’s telling that Google is debuting it in a counter-intuitively small and targeted application. Huffman is very clear that it’s nowhere close to general artificial intelligence, in which a computer is smart enough to handle all kinds of situations that it hasn’t been specifically trained for–an important distinction to make. For instance, if you ask Duplex for the weather, it won’t know how to answer. The challenge of translating a conversational technology that’s been trained to work very specifically for only three different use cases to a more general assistant remains to be seen. It’s likely the Duplex voice is so good because the situations are so narrow.
Duplex’s very real-sounding voice puts it on a different level from any other machine voice I’ve ever heard, and places it firmly in the realm of science fiction. It’s easy to imagine the seductive operating system of the movie Her becoming a reality sooner than we might think. For now, the real-world impact of Duplex, particularly on the people who will interact with it, remains to be seen. It doesn’t sound quite sound like Scarlett Johansson, but I have a feeling that despite disclosing its status as AI, it’ll do just as good a job at convincing people it’s more human than robot.