Google Duplex: Making AI Conversationalists
The biggest challenge when it comes to machines, is to make them natural conversationalists. Right now we can tell the machine to do this or that but we can’t really have normal conversations with them, that is where Google Duplex comes in. Google Duplex not only understands normal human conversations but also understands the context in which a person is speaking and then replies accordingly. Currently conversation systems in place, sound stilted and struggle to understand the basics of the conversation.
Here is where Google Duplex enters in. The AI network is capable of making certain conversations over the phone such as making an appointment. The conversation is supposed to be natural and the person on the other end shouldn’t even know that they are basically talking to a robot. Google Duplex is centered around only certain areas so that each of these areas can be explored extensively enough to help Google Duplex learn the speech requirements.
The whole idea for Google Duplex is for the system to sound natural,which means understanding the nuances of conversation like understanding when to pause mid- sentence and understanding the context of the conversation while making the right intonations.
When people have conversations with each other they often speak in more complex sentences, omit words and use words such as “umm” and “uh” often in the sentence. The problems of natural conversation is compounded by the fact that when the conversation is made over the phone, then word errors and such become all the more common taking into account background noises.
Google Duplex is a recurrent neural network. Google has trained Google Duplex to the highest level of precision based on a bunch of conversation data. Based on this conversation data, Google Duplex uses Google automatic speech recognition tech, features from the audio, the parameters of the conversation and the history of the conversation to learn speech requirements.
As for sounding natural, Google used text to speech or TTS engine for short and a synthesis TTS engine to help with intonation in a natural conversation. Google has also incorporated speech disfluencies or the “umms”and “uhhs” in conversation to make Google Duplex more natural sounding. They have also used timed pauses, as if a person is gathering their thoughts, in the Google Duplex system. But to make these timed responses more natural they have to be placed in the right places in a conversation, for that Google Duplex uses low-confidence models to decide when a pause is required and when it is not.
The entire Google Duplex system is capable of carrying out sophisticated conversations all on its own without any human intervention.
The Google Duplex system also comes with self- monitoring capabilities that allow it to know when the conversation is going out of its depth and then it lets a human operator take over.
The Google Duplex system goes through a series of training wherein a real life instructor guides the system in conversation and can correct the Google Duplex system too in real time.
Here is where Google Duplex enters in. The AI network is capable of making certain conversations over the phone such as making an appointment. The conversation is supposed to be natural and the person on the other end shouldn’t even know that they are basically talking to a robot. Google Duplex is centered around only certain areas so that each of these areas can be explored extensively enough to help Google Duplex learn the speech requirements.
Challenges in making Google Duplex sound realistic:
The whole idea for Google Duplex is for the system to sound natural,which means understanding the nuances of conversation like understanding when to pause mid- sentence and understanding the context of the conversation while making the right intonations.
When people have conversations with each other they often speak in more complex sentences, omit words and use words such as “umm” and “uh” often in the sentence. The problems of natural conversation is compounded by the fact that when the conversation is made over the phone, then word errors and such become all the more common taking into account background noises.
How does Google Duplex solve these problems?
Google Duplex is a recurrent neural network. Google has trained Google Duplex to the highest level of precision based on a bunch of conversation data. Based on this conversation data, Google Duplex uses Google automatic speech recognition tech, features from the audio, the parameters of the conversation and the history of the conversation to learn speech requirements.
As for sounding natural, Google used text to speech or TTS engine for short and a synthesis TTS engine to help with intonation in a natural conversation. Google has also incorporated speech disfluencies or the “umms”and “uhhs” in conversation to make Google Duplex more natural sounding. They have also used timed pauses, as if a person is gathering their thoughts, in the Google Duplex system. But to make these timed responses more natural they have to be placed in the right places in a conversation, for that Google Duplex uses low-confidence models to decide when a pause is required and when it is not.
In Conversation with Google Duplex:
The entire Google Duplex system is capable of carrying out sophisticated conversations all on its own without any human intervention.
The Google Duplex system also comes with self- monitoring capabilities that allow it to know when the conversation is going out of its depth and then it lets a human operator take over.
The Google Duplex system goes through a series of training wherein a real life instructor guides the system in conversation and can correct the Google Duplex system too in real time.