SoundStorm: Google unveiled a revolutionary artificial intelligence tool capable of reproducing voice in real time | Neural Networks and Prompts Library in Russian

SoundStorm: Google Unveils Revolutionary Artificial Intelligence Tool Capable of Real-Time Voice Playback

31.05.23

SoundStorm_Google_has_represented_the_revolutionary_instrument_of_the_art

Google has unveiled its latest breakthrough in artificial intelligence technology, SoundStorm, an advanced model for efficient and non-autoregressive audio generation. With the ability to synthesize dialog with different voices, SoundStorm opens up new possibilities for applications such as generating audio content from written text and creating realistic podcasts.

Unlike its predecessor AudioLM, SoundStorm uses a new architecture that generates audio in 30-second chunks, increasing efficiency. Using bi-directional attention and parallel confidence-based decoding, the model generates high-quality audio, dramatically reducing generation time. On Google's TPU-v4 hardware, SoundStorm can generate 30 seconds of audio in as little as 0.5 seconds, which means a significant improvement in speed.

SoundStorm used a massive dataset of 100,000 hours of dialogue to train it, providing a robust understanding of spoken language patterns. The model achieves impressive consistency in vocal and acoustic conditions, while retaining the sound quality achieved by AudioLM. This breakthrough makes SoundStorm two orders of magnitude faster than its predecessor, demonstrating its potential for scalable sound generation.

One of SoundStorm's key features is its ability to synthesize natural dialogue using the SPEAR-TTS text-semantic modeling stage. By providing transcripts with speaker turns and short voice prompts, users can control the speech content and speaker voices. During testing, SoundStorm demonstrated the ability to synthesize 30-second dialogue segments in just 2 seconds on a single TPU-v4, demonstrating its effectiveness and versatility.

When compared to standard basic models, the sound generated by SoundStorm is equivalent in quality to AudioLM and exhibits excellent coherence and acoustic integrity. Remarkably, when required to provide a speech sample, the model preserves the speaker's voice with amazing accuracy, greatly enhancing its ability to generate realistic dialogues.

Despite the outstanding capabilities of SoundStorm, it is very important to recognize and address possible ethical issues. The data for training the algorithm can be biased with respect to accents and voice features. The ability to mimic the voice can be used to impersonate another person or to bypass biometric identification. Google emphasizes the importance of adopting safeguards to prevent such misuse and ensuring that created audio recordings can be detected using special classifiers.

Google's AI ethics guide its ongoing efforts to address potential hazards and limitations. The organization is aware of the need to scrutinize training data and the implications for model results. They also plan to explore additional approaches, such as audio watermarking, to detect synthesized speech in order to make the use of this technology ethical.

SoundStorm is a big step forward in AI-assisted audio production, providing high-quality and efficient audio representation using neural audio codecs. Google expects that SoundStorm's lower memory and processing requirements will make audio generation research more accessible to a wide range of users. Google remains committed to maintaining responsible AI practices and ensuring the safe and responsible use of SoundStorm and comparable breakthroughs in this area as the technology evolves.

More in the category Google

Google

16.05.24

Google unveiled Project Astra, an innovative AI assistant with the ability to perceive and make sense of its environment

As part of Google's annual I/O developer event, the head of DeepMind's artificial intelligence division, Demis Hassabis, provided a first look at what...

O_2024_announced_a_row_of_high-profile_novelties

Google

15.05.24

Google announced a number of high-profile AI innovations at the I/O 2024 conference

The latest Google I/O 2024 event showcases significant innovations in artificial intelligence that deserve a special review. - The family of open...

Google

18.01.24

Google announces AI updates for search and Gemini for Samsung's new S24

As part of the Samsung Galaxy Unpacked conference, Google announced two major search updates: Circle to Search and multisearch based on...

Google

05.06.23

Google released a neural network StyleDrop, it can create images in a precisely defined style

StyleDrop learns the style of any image and helps a generative artificial intelligence model recreate it. Google's method is superior to others, such as Dreambooth,...

Google

12.05.23

Google allows users to create music using the MusicLM neural network

Google announced the availability to the general public of its MusicLM neural network, which allows you to create music based on text descriptions. The system successfully...

How Siri, Alexa and Google Assistant lost out in the A.I. race

Google Alexa

16.03.23

How Siri, Alexa and Google Assistant lost out in the A.I. race

Virtual assistants had more than a decade to become indispensable. But they were hampered by clumsy design and calculation errors, which...

Google

13.03.23

Google is one step closer to creating a 1,000-language artificial intelligence model

Google is developing all sorts of AI technologies, including a universal speech model, which is part of an attempt to create a model that can understand the 1000 most common...