New ideas | new devices bring new hope to far field speech recognition!

May 27, 2021

There have been many significant advances in the field of speech recognition. However, there are still many problems to be solved in the real free communication between robots and humans. One of the problems is far-field speech recognition.

At present, the computer converts voice into text, which is limited to the situation of near talk. Once the person is far away from the microphone, there is reverberation or noise, the speech recognition rate is drastically reduced, especially in the case of reverberation. Speech recognition challenges, which is very different from people. That is to say, in the case of proper reverberation, people will feel full of sound and hear more clearly, but for computers and robots, the opposite is true. Also, in the well-known cocktail effect, we humans have a hearing choice ability. The image description is that at a cocktail party, people can focus on one person's conversation and ignore the conversations of people around them. And background noise. Despite the noise around us, we can still hear what we are interested in. This ability is available to everyone, but it is very difficult to make the machine have this ability.

Existing solution

For nearly half a century, scientists have been working to solve this problem. At present, there are two main research directions for solving the cocktail effect of the machine auditory system.

The first is auditory scene analysis, which is based on the separation of mixed speech based on audio features and language models. For example, the harmonic characteristics of the speech signal, the short-term stationary characteristics, and the hidden Markov language model can be used to separate the mixed conversation sounds. However, one of the shortcomings of this method is that there are some unreasonable assumptions about the speech, such as the voices of different people do not overlap in the spectrum. Moreover, in addition to this, the estimation method based on the language model has a large amount of computation and is difficult to be practical.

The second method is based on a microphone array, which uses a microphone array to design a spatial filter to extract the sound source in a specific direction and suppress the speech in other directions, thereby achieving the purpose of separating sounds at different positions. Microphones and computational complexity.

Future solutions

Obviously, the above two existing solutions are not satisfactory to us. However, recent scholars at Duke University in the United States have brought us new hopes.

By combining acoustic materials and compression sensing technology with a novel device of the invention, not only a single microphone can achieve separation of three mixed sound sources, but also the correct rate can reach 96.67%. The new device, unlike traditional signal processing methods, encodes sound sources in different directions by designing subtle acoustic materials without any prior knowledge or assumptions about the sound source.

The new device is made up of a plastic disc that looks like a pizza. A microphone is placed in the center of the disc, and the microphone is composed of 36 sector-shaped channels, each of which is an acoustic waveguide composed of a plurality of honeycomb structures. Each channel is capable of modulating the sound waves passing through it, so the overall structure is similar to an adjustable parameter equalizer.

The working principle of the disc is very similar to when you are talking to a bottle with water. Due to the vibration of the sound waves, the air inside the bottle will resonate, so that the energy of some frequencies of the sound will be attenuated, and the frequency of the attenuation is determined by the amount of water in the bottle. Each channel of the disc is similar to a bottle with water. By exquisitely designing the height of the honeycomb lattice in each channel, the energy of different frequencies of the sound can be attenuated, so as to achieve the purpose of encoding the sound wave. .

However, due to the large size of the new device, it is still difficult to obtain a good application in practice. But imagine that once the device can be miniaturized, it will replace the current common microphone array technology. It's a wonderful thing to extract a voice of interest in a noisy environment with a single microphone and without the need for complex calculations.

Screen Protector for Huawei

Perfect design: specially designed for Huawei series smartphones, precise cut opening, perfect fit, can prevent scratches, drops, bumps and scratch damage.

Imported TPU material: Using high-quality imported TPU material, it can automatically absorb and repair small scratches. If small bubbles appear when installing the Soft Film, the bubbles will disappear within 24 hours.

Sensitive touch: The High-Definition Transparent Screen Protective Film has crystal-like clarity, provides invisible protection, ultra-thin thickness of 0.14mm and smooth touch, so that your touch is zero delay.

Oleophobic and waterproof: The use of hydrophobic and oleophobic coatings can protect the phone screen from fingerprints and grease residues on the edge of the screen, and protect the phone from dust.

If you want to know more about Screen Protector For Huawei, please click the product details to view the parameters, models, pictures, prices and other information about Screen Protector For Huawei.

Whether you are a group or an individual, we will try our best to provide you with accurate and comprehensive information about the Screen Protector For Huawei!

Screen Protector For Huawei, Transparent High-Definition Protective Film, Anti-Fingerprint Screen Protector, Flexible Screen Protective Film

Shenzhen Jianjiantong Technology Co., Ltd. , https://www.tpuprotector.com

New ideas | new devices bring new hope to far field speech recognition!

Ren Zhengfei reveals Huawei's future strategic development direction