Synthesized speech systems (the artificial creation of speech with a computer) have existed for decades. Computerized voices like Apple’s Fred and Siri are instantly recognizable, because they sound robot-like and not quite human. This is because these voices are created by computer programs stringing together seperate syllables to create words and sentences. Recently, Google’s Deepmind team has changed the method of creating speech. In addition to putting together syllable sounds, the system uses machine learning, meaning Deepmind uses samples of real human speech to teach itself how to sound more human. As of now, the speech created by machine learning is identical to real human speech. This is evident in the research paper written by employees of google, where you can see for yourself how similar real and new synthesized speech is.
Google has already realized the power of this technology and is looking to put it into use. Their idea is to use this realistic speech to make phone calls for people, like ordering food, booking salon appointments, and making dinner reservations. This was revealed at a developer’s conference where real businesses were called using the technology. You can watch this clip here. From the video google showed, we can see how lifelike the voice sounds; even sounds like “um” and “Mm-hmm.” Although I was fascinated by the accuracy of the machine learned speech, I was also disturbed by it. The fact that we may soon not know if we are talking to a human or computer leads me to think we are becoming further disconnected from each other. We are going from indirectly talking to people via text, phone or social media to having computers talk for us. I personally don’t see the why we should use this technology for things like making appointments.
Leave a Reply