How Artificial Intelligence Works: Speech Recognition - Alternative View

Table of contents:

How Artificial Intelligence Works: Speech Recognition - Alternative View
How Artificial Intelligence Works: Speech Recognition - Alternative View

Video: How Artificial Intelligence Works: Speech Recognition - Alternative View

Video: How Artificial Intelligence Works: Speech Recognition - Alternative View
Video: Skibidi Toilet | THE MOVIE 2024, September
Anonim

Each of us is faced with such a mysterious phenomenon as artificial intelligence in everyday life - it is he who allows voice assistants and search engines to recognize human speech and guess the desires of users. Today we will talk about exactly how this technology is arranged and what prospects await this area of development in the near future.

Artificial intelligence is a very broad term, within which many algorithms already exist and are still under development, designed to perform a wide range of practical tasks. But what are modern artificial intelligence programs actually capable of, and what principles are they guided by during their work? Today we will talk about one of the key features of the machine mind, which each of us regularly encounters in everyday life - the ability of voice assistants to recognize human speech.

Voice recognition

To measure the voice, the program uses a number of sound parameters: the frequency and length of the sound wave at a certain point in time. For example, when you chat with the popular voice assistant Alexa, the software splits your voice into 25-millisecond slides, and then converts each of the segments into digital signatures. After that, the signature blocks are compared with the internal catalog of program sounds until the number of matches is high enough for the AI to "translate" the numbers into an alphabetic query that it understands.

Image
Image

Watch your phone screen while using Siri or Google Assistant and you will see that the vocabulary changes as you speak the words. This happens due to the fact that with each next "step" the software also compares the obtained result with the internal database and builds words depending on the matches. According to Rohit Prasad, chief scientist at Amazon's Alexa division, "the language model learns many billions of words in the form of text." Word order also plays an important role: this can be noticed with the help of the usual Google search engine, which sometimes gives different data for identical queries, in which only a couple of words are rearranged.

Promotional video:

Perspectives of speech recognition

Alan Black of the Carnegie Institute for Language Technology argues that for all professionals in large companies, the most interesting thing is to find the limit of their own system. “When the program says, 'I can't do this,' then the situation gets really interesting,” he jokes. However, this is indeed the case: responding to unpredictable user requests is even one of the main tasks that student circles that are competing for the Alexa Prize - and this is as much as $ 2.5 million - are investigating. Their task is to create a chatbot designed to communicate with people who ask consistent and meaningful questions. Information in this case is updated every 20 minutes. Sounds like a pretty easy task even for an average programmer,but in practice, the communication of the program with real people is always associated with deviations from the topic of dialogue, spontaneous phrases and other violations. A program that learns to work with them as well as a real person will be a huge breakthrough for the entire AI industry.

Vasily Makarov