2017 year-end interpretation: speech recognition technology has only gone half way this year

**End-of-Year Interpretation: Speech Recognition Technology Has Only Traveled Half the Way This Year** This year, Baidu launched its voice platform DuerOS, while Alibaba invested 4 billion to boost sales of millions of smart speakers in an effort to dominate the voice portal. As a pioneer in speech recognition, many began to worry that the technical barriers established by the University of Science and Technology over the past two decades might be eroded. Some people left Xunfei to start their own ventures, while others focused on reporting financial results for years. The AI wave this year undoubtedly attracted more attention to the University of Science and Technology. In such a trend, how can a company deeply committed to speech recognition gain more business and profits? How can it align with the rising tide of AI and meet all expectations of artificial intelligence? In fact, the progress in technology this year followed a similar path as previous years. (We interpreted the developments of 2017 from the perspective of speech recognition, with some insights derived from interviews with Xunfei.) **Starting from Data Upgrades in 2017** Last year, IBM, Microsoft, Google, and Baidu all released their own speech recognition advancements. This year, there were three major improvements in word error rates: - In March 2017, IBM combined LSTM with WaveNet language models, achieving a breakthrough of 5.5% word error rate, down from 6.9% in May of the previous year. - In August 2017, Microsoft made a new milestone by improving the neural network-based auditory and language models in its system, reducing the error rate by about 12% compared to the previous year, reaching 5.1%, claiming to surpass professional stenographers. - In December 2017, Google introduced a new state-of-the-art speech recognition system using sequence-to-sequence models, reducing the word error rate to 5.6%, a 16% improvement over legacy systems. All companies aimed to "surpass humans," with a target of 5.9% word error rate. In summary, the introduction of Deep CNN led to significant breakthroughs in speech recognition. For example, Google improved performance by 20% from 2013 to now. Domestic companies like Baidu, Sogou, and Keda Xunfei achieved around 97% recognition rates. Chinese speech recognition surpassed the human level one year earlier than English. HKUST introduced a new deep full-sequence convolutional neural network (DFCNN) framework last year, improving recognition rates by over 15% compared to the best bidirectional RNN system. This year, Xunfei's input method finally broke through 97% in July, reaching 98%. Technology being "available" is just the first step, but ultimately, it must land and be realized through products and services. What changed in the application scenarios this year? Smart speakers were the first thing that came to mind. According to 2016 statistics, China's smart speaker sales accounted for 0.35% of the global total, with 60,000 units sold. After Alibaba's Double Eleven subsidies in 2017, it could be said that "China's smart speaker sales exceeded one million," marking a real explosion. However, from a demand perspective, the functions of smart speakers were limited to playing music, setting alarms, and controlling smart homes—functions not necessarily essential for the Chinese market. The BAT giants preempted smart speakers as voice portals, giving us an illusion of growth. But this year, the application scenarios became increasingly diverse. With development across various fields, intelligent voice technology moved beyond quiet indoor environments and entered service halls, stores, and even cars. Advances in machine translation, far-field recognition, noise reduction, multi-turn dialogue, and intelligent interruptions brought more changes to the application of intelligent speech. In the field of intelligent vehicles, the Flying Fish System 2.0 released in 2017 integrated Barge-in full-duplex voice interaction, narrow beam orientation recognition, natural semantic understanding, wake-free technology, and multi-turn dialogue. HKUST has already exported voice interactive products to over 200 vehicle models and more than 10 million vehicles. In addition, in the new retail sector, intelligent voice technology is also expanding. For example, on December 18, Keda Xunfei and Red Star Macalline launched a strategic cooperation plan. In the future, the smart shopping robot “Meimei” developed by Keda Xunfei will be available in Red Star Macalline stores nationwide. **Sixty Years of Speech Recognition: Breakthroughs Are Always Difficult and Slow** Speech recognition dates back to the 1950s, when AT&T Bell Labs' Audry system first recognized ten English digits. In the 1960s, CMU's Reddy started pioneering work in continuous speech recognition, but progress was slow, leading John Pierce of Bell Labs to believe speech recognition was almost impossible. In the 1970s, improved computer performance and pattern recognition research drove speech recognition forward. IBM and Bell Labs launched real-time PC-side isolated word recognition systems. The 1980s marked rapid development, with the introduction of Hidden Markov Models (HMM), transitioning speech recognition from isolated word systems to large vocabulary continuous systems. The 1990s saw mature speech recognition, but results were still far from practical. Research hit a bottleneck. The key breakthrough came in 2006 when Hinton proposed the Deep Belief Network (DBN), reviving interest in Deep Neural Networks (DNN). In 2009, Hinton and his student Mohamed applied DNNs to acoustic modeling, succeeding in the TIMIT database. In 2011, Microsoft’s Yu Dong and Deng Li published papers on applying DNNs to speech recognition, achieving breakthroughs in large vocabulary tasks. Both domestic and international giants began investing heavily in speech recognition research. **The Road of Intelligent Voice Exploration at Keda Xunfei** In 2010, Keda Xunfei launched its first DNN speech recognition research. In 2011, it launched the world's first Chinese speech recognition DNN system. In 2012, it pioneered RBM technology in speech synthesis. In 2013, it introduced BN-ivec technology in language recognition. In 2014, the University of Science and Technology began deploying NLP research. In 2015, it fully upgraded its RNN speech recognition system. In 2016, Keda Xunfei launched the DFCNN (Deep Fully Convolutional Neural Network) speech recognition system. Combined with other technologies, the DTFNN framework achieved a 15% performance improvement in internal Chinese SMS dictation tasks. Compared to the industry's best two-way RNN-CTC system, it showed a 15% improvement. Using the HPC platform and multi-GPU parallel acceleration, the training speed was better than traditional systems. The proposal of DFCNN opened up a new world for speech recognition, and more research based on the DFCNN framework is expected. **Unresolved Issues in Speech Recognition** Although deep learning significantly reduced word error rates, it doesn't mean all speech recognition problems are solved. Recognizing these issues and finding solutions is key to further progress. ASR has evolved from being "only for a few people" to being "applicable to anyone at any time." **Accent and Noise** One of the most obvious flaws in speech recognition is handling accents and background noise. Most training data is high SNR and accented. For example, building a high-quality English recognizer for American accents requires over 5,000 hours of transcribed audio, making it difficult to solve with training data alone. In China, dialect issues have been better addressed. Keda Xunfei has launched 22 dialect-related speech recognition systems, but solving the cost issue for dialects or foreign languages with different phoneme systems remains challenging. **Multi-Person Conversations** Each speaker uses a separate microphone, so no overlapping voices occur in the same stream. This makes the task easier. However, humans can understand conversations even when multiple people speak at once. A good conversational speech recognizer must divide the audio by speaker and understand overlapping sounds. These challenges remain in using voice technology to drive changes in input and interaction models. Although speech recognition rates in multi-person dialogues are high, voiceprint recognition is still in the lab stage and far from real-world application. **Cognitive Intelligence** Speech recognition has good applications in quality inspection and safety, but achieving 100% accuracy still requires much work. For example, reducing semantic errors and understanding context (machine learning and reasoning) is only scratching the surface. There has been no real breakthrough in cognitive intelligence. It is this AI boom—including the industrialization boom—that holds the key to further progress and forming a larger industry. At the end of 2017, the Ministry of Science and Technology officially announced that the University of Science and Technology would establish the first national key laboratory for cognitive intelligence. In the next five years, many open and challenging issues remain in speech recognition, such as expanding capabilities in new regions, accents, far-field, and low SNR speech; introducing more context into the identification process; source separation; evaluating semantic error rates; and developing innovative speech recognition methods. While current achievements are impressive, the remaining challenges are as daunting as those already overcome. Although deep neural networks have greatly improved speech recognition, we are not superstitious about existing technologies. One day, new technologies will replace them. **Beyond Technology: The Things That Matter for an AI Company** Artificial intelligence has spawned numerous new technologies, enterprises, and business forms. Under the AI trend, as the leading stock in A-share AI, the company surged over 36 billion yuan in one month, with a market value exceeding 100 billion. It seems to match the public perception of AI as something almost deified. On November 15, 2017, the meeting for the implementation of China's new generation of AI development planning and major science and technology projects was held in Beijing. The Ministry of Science and Technology announced the first batch of national AI open innovation platforms, including: 1. Building a self-driving country based on Baidu's AI open innovation platform; 2. Relying on Alibaba Cloud to build an urban brain AI open innovation platform; 3. Relying on Tencent to build a medical imaging AI open innovation platform; 4. Relying on Keda Xunfei to build an intelligent voice country AI open innovation platform. As the first batch of national AI open innovation platforms, Keda Xunfei, according to Liu Qingfeng, is "not yet at the summit." It can only be said that it has started climbing and has just overcome initial difficulties, adapting to the state of the nation and the current state of speech recognition technology. AI is a big trend, requiring long-term investment. It will have a lasting impact, so we cannot be short-sighted in expecting immediate returns. "We must have strong technology to create just-needs," Liu Qingfeng said. "It's about deeply understanding technology so that everyone truly feels there is a need." He added, "We are aiming at cutting-edge technology research for the next five to ten years."

Off Grid Solar Inverter

Single Phase Inverter 6KW,Off Grid Solar Power Inverter 48V,48V Off Grid Inverter,GOOTU Hybrid Inverter,Parallel Solar Inverter 6KW Single Phase,4.2kw 24v hybrid solar inverter,5kva hybrid solar inverter,hybrid solar inverter 4kw,hybrid off grid solar inv

Shenzhen Jiesaiyuan Electricity Co., Ltd. , https://www.gootuenergy.com