2017 year-end interpretation: speech recognition technology has only gone half way this year

**End-of-Year Interpretation: Speech Recognition Technology Has Only Gone Halfway This Year** This year, the speech recognition landscape has seen significant developments. Baidu launched its voice platform, DuerOS, while Alibaba invested heavily in smart speakers, aiming to capture the voice portal market. As a pioneer in this field, many have started questioning whether the technical barriers set by the University of Science and Technology over the past 20 years are being eroded. Some former employees of Xunfei have started their own ventures, while others have been busy preparing financial reports for the long-term. The AI wave this year has definitely drawn more attention to the University of Science and Technology. In such a trend, how can a company deeply committed to speech recognition gain more business and profits? How can it align with the rising tide of AI and meet all expectations of artificial intelligence? In fact, the progress in technology this year is similar to previous years. (We interpret the 2017 advancements from the perspective of speech recognition, with some insights coming from interviews with Xunfei.) **Starting from Data Upgrades in 2017** Last year, IBM, Microsoft, Google, and Baidu all released their own speech recognition updates. This year, there were three notable improvements in word error rate: - In March 2017, IBM combined LSTM models with WaveNet language models, achieving a breakthrough of 5.5% word error rate, down from 6.9% in May 2016. - In August 2017, Microsoft improved its neural network-based auditory and language models, reducing the error rate by about 12% compared to the previous year, reaching 5.1%, claiming to surpass professional stenographers. - In December 2017, Google introduced a new state-of-the-art speech recognition system using sequence-to-sequence models, reducing the word error rate to 5.6%, an improvement of 16% over legacy systems. Everyone's goal is clear: to "exceed humans," which was previously set at a 5.9% error rate. In summary, thanks to the introduction of deep CNN, speech recognition has made great strides. For example, Google improved performance by 20% from 2013 to now. Domestic companies like Baidu, Sogou, and Keda Xunfei have achieved around 97% recognition rates. In terms of speech recognition, Chinese surpassed the human level one year earlier than English. HKUST introduced a new deep full-sequence convolutional neural network (DFCNN) framework last year, improving recognition rates by over 15% compared to the best two-way RNN systems. This year, the Xunfei input method finally broke through 97% in July, reaching 98%. Technology “availability” is the first step, but ultimately, it must be applied in real-world products and services. What changed in the application scenarios this year? Smart speakers were the first to come to mind. According to 2016 statistics, China’s smart speaker sales accounted for 0.35% of global total, at 60,000 units to 17.1 million. After Alibaba’s Double Eleven subsidies in 2017, it could be said that “China’s smart speaker sales exceeded one million,” marking an explosion in the market. However, from a demand perspective, the functions of smart speakers are limited to music playback, alarm clocks, and smart home control—functions that may not fully meet the needs of Chinese consumers. The BAT giants have pre-empted smart speakers as voice portals, creating an illusion of growth. But this year, the application scenarios have become more diverse. With development across various fields, intelligent voice technology has moved beyond quiet indoor spaces and into service halls, stores, and even cars. Advances in machine translation, far-field recognition, noise reduction, multi-round interaction, and intelligent interruption have brought more changes to speech applications. In the field of intelligent vehicles, the Flying Fish System 2.0 released by Science and Technology News in 2017 integrated technologies such as Barge-in full-duplex voice interaction, narrow beam orientation recognition, natural semantic understanding, wake-free technology, and multi-round dialogue. HKUST has already exported voice interactive products to over 200 vehicle models and 10 million vehicles. Additionally, in the retail sector, the application of intelligent voice technology is expanding. For instance, on December 18th, Keda Xunfei and Red Star Macalline announced a strategic cooperation plan. In the future, the smart shopping robot “Meimei” developed by Keda Xunfei will be available in Red Star Macalline stores nationwide. **Sixty Years of Speech Recognition: Breakthroughs Are Always Difficult and Slow** Speech recognition dates back to the 1950s, when AT&T Bell Labs’ Audry system recognized ten English digits. In the 1960s, CMU’s Reddy began pioneering continuous speech recognition, though progress was slow, leading John Pierce to believe it was nearly impossible. In the 1970s, computer performance improvements and pattern recognition research advanced the field. IBM and Bell Labs launched real-time PC-side isolated word recognition systems. The 1980s saw rapid development with the introduction of Hidden Markov Models (HMM), transitioning speech recognition from isolated words to large vocabulary continuous speech. By the 1990s, the technology matured, but practical use remained limited. Research hit a bottleneck. The key breakthrough came in 2006 when Hinton proposed the Deep Belief Network (DBN), reviving interest in deep neural networks. In 2009, Hinton and Mohamed applied deep learning to speech acoustic modeling, achieving success in the TIMIT database. In 2011, Microsoft Research’s Yu Dong and Deng Li published breakthroughs in large vocabulary speech recognition. Both domestic and international tech giants accelerated their research. **The Journey of Intelligent Voice Exploration by Keda Xunfei** In 2010, Keda Xunfei began DNN speech recognition research. In 2011, it launched the world’s first Chinese DNN speech recognition system. In 2012, it pioneered RBM technology in speech synthesis. In 2013, it introduced BN-ivec in language recognition. In 2014, the University of Science and Technology began deploying NLP research. In 2015, it upgraded its RNN speech recognition system. In 2016, the DFCNN (Deep Fully Convolutional Neural Network) speech recognition system was launched. Combined with other technologies, the DTFNN framework achieved a 15% improvement in internal Chinese SMS dictation tasks. Compared to the industry’s best two-way RNN-CTC system, it offered a 15% performance boost, supported by the HPC platform and multi-GPU parallel acceleration technology of HKUST, significantly speeding up training. The proposal of DFCNN opened new possibilities in speech recognition. Based on the DFCNN framework, further research will continue. **Unresolved Issues in Speech Recognition** Despite the significant reduction in word error rate due to deep learning, many challenges remain. Recognizing these issues and finding solutions is crucial for progress. **Accent and Noise** One major flaw is handling accents and background noise. Most training data is clean and accented, making it hard to solve purely through data. In China, dialect recognition has improved, with 22 dialect systems launched by HKUST. However, solving the cost issue for dialects or foreign languages with different phoneme systems remains challenging. **Multi-Person Conversations** Current systems work well with single-speaker audio. But human-like understanding of overlapping conversations remains a challenge. **Cognitive Intelligence** While speech recognition excels in quality inspection and safety, achieving 100% accuracy requires more scientific work. Cognitive intelligence still lacks real breakthroughs. The Ministry of Science and Technology recently established the first national lab for cognitive intelligence, highlighting its importance. In the next five years, challenges like expanding capabilities to new regions, accents, and low SNR environments, introducing more context, source separation, and semantic evaluation remain. Though progress has been made, the remaining challenges are as tough as before. **Beyond Technology: The Business of AI Companies** Artificial intelligence has given rise to new technologies, businesses, and industries. As a leading A-share AI stock, Keda Xunfei surged over 36 billion yuan in one month, with a market value exceeding 100 billion. It reflects the public perception of AI as a powerful force. On November 15, 2017, the Ministry of Science and Technology announced the first batch of national AI open innovation platforms, including Keda Xunfei’s voice AI platform. Liu Qingfeng emphasized that while the company has made progress, it is still climbing the mountain and adapting to the current state of speech recognition. AI is a long-term trend requiring significant investment. “We must have strong technology to form just-needs,” Liu said. “We aim for cutting-edge research from five to ten years ahead.”

EV Charger Tester

EV Charger Tester,EV Charging Station Tester,EV Charger testing Equipment,EVSE Testing Equipment,EV Charger Test Simulator

Shenzhen Jiesaiyuan Electricity Co., Ltd. , https://www.gootuenergy.com