![]() ![]() ![]() “This makes it difficult to modify the voice (for example switching to a different speaker, or altering the emphasis or emotion of their speech) without recording a whole new database.” ![]() “Generating speech with computers - a process usually referred to as speech synthesis or text-to-speech (TTS) - is still largely based on so-called concatenative TTS, where a very large database of short speech fragments are recorded from a single speaker and then recombined to form complete utterances,” the DeepMind team explained in a blog post. The most famous of these is Hatsune Miku, a completely artificial pop star who recently toured the US as a holographic display.Īrtificial Japanese pop star Hatsune MikuĭeepMind explained that this method limits the possibilities of text-to-speech, but with neural networks and Wavenet, a greater variety of sounds and voices are possible. It has also been used in interesting projects like Yamaha Corp.’s Vocaloid software, a Japanese music creation program that allows users to change the pitch and rhythm of synthesized speech to create songs with artificial singers. This process of speech synthesis has been used by a wide variety of text-to-speech software over the years, including intelligent assistants like Apple’s Siri and Microsoft’s Cortana. This is what makes many speech programs sounds somewhat cold and robotic, much like Texas Instruments’ old Speak and Spell toys from the 1980s. According to DeepMind, this is a very different method than what most other text-to-speech programs use, which rely on databases of pre-recorded sounds that are cut and pasted together to form words. More specifically, DeepMind is working on WaveNet, an advanced text-to-speech synthesis tool that uses neural networks to determine the right combinations of sounds required to create individual spoken words. Now, Google parent company Alphabet Inc.’s DeepMind unit is applying the same tools to the opposite problem: getting computers to talk to people. Computers are getting better and better at understanding human speech thanks to powerful data tools like deep learning and neural networks. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |