Baidu Research

A Look Back on Baidu’s AI Innovations in 2019

2020-01-21

王海峰.jpg

AI has become a key driver of innovation in the tech industry over the past decade, but 2019 marked a tipping point where AI expanded beyond narrow industry-specific applications to enter all walks of life at unprecedented scale and speed. AI is now stepping into the phase of industrial production. Baidu CTO Haifeng Wang says, “like the core technologies of the previous industrial revolutions, AI is highly generalized, showing the characteristics of industrial production, such as standardization, automation and modularity.”

In 2019, we continued our dedication to pushing the limits of AI while achieving a broader implementation of AI capabilities across Baidu’s products and services. With the mission of making our complicated world simpler through technology, we actively encouraged AI innovation for the benefit of society.

As we head into 2020, it’s a good time to share our perspective on how we view AI’s progress over the past year and to recap what we have achieved through our own efforts and innovations in AI.

Driving breakthroughs in fundamental AI technologies

Our R&D efforts are paying off and 2019 marked several key milestones. From explicitly advancing machine learning algorithms and standardizing AI pipelines to open-sourcing powerful tools for plug-and-play AI deployment, we achieved significant progress in the development of innovative AI technologies.

We ranked No.1 in the number of AI-related patent applications in China for the second consecutive year, filing a total of 5,712 patents as of October 2019. Deep learning (1,429), natural language processing (938), and speech recognition (933) account for almost 60 percent of those patent applications.

Natural language processing (NLP) was the most talked-about technology last year and has achieved unprecedented progress thanks to deep learning. We introduced ERNIE, a continual pre-training framework that builds and learns tasks incrementally through sequential multi-task learning. The latest version of ERNIE topped the public GLUE leaderboard, followed by Microsoft's MT-DNN-SMART and Google’s T5. The paper has been accepted by AAAI 2020.

We have been applying ERNIE’s semantic representation to our real-world application scenarios. For example, our third quarter 2019 financial results highlighted that the percentage of user queries satisfied by our top 1 search results has been improved 16% absolutely compared a year ago. This improvement was due to the adoption of ERNIE for question answering in our search engine.

In speech recognition, while deep learning has dramatically decreased word error rates, computers are still no match to human-level speech recognition, as mixed speech recognition, or dialect recognition remain unsolved. Last year we proposed SMLTA, the industry’s first streaming multi-layer truncated attention model for large-scale online speech recognition. In addition to a significant boost in the speed and accuracy of speech recognition, SMLTA enables Baidu’s products to recognize Chinese-English mixed speech and six major Chinese dialects.

We further enhanced the voice feature of Baidu Maps, allowing users to customize Baidu Maps' voice navigation with their voice by recording 20 sentences. At the core of this feature is Meitron, a new speech synthesis technology we introduced that can customize any speech style.

Pushing the limits of computing

Along with advances in algorithms, we are expanding our infrastructure to supply massive compute and storage. Last year, we announced a CN¥1.4 billion investment into a new cloud computing center in Yangquan, Shanxi Province. It is going to be the first fully distributed data center with a power supply and cooling system of less than 1.15 of power usage effectiveness (PUE) while the typical average of PUE is 1.58 in 2018, according to Uptime Institute Global Data Center Survey.

In December, our home-grown AI accelerator Baidu Kunlun, which offers 512 gigabytes per second (GBps) of memory bandwidth and supplies up to 260 Tera operations per second (TOPS) at 150 watts, was made available on Baidu Cloud servers to accelerate a wide range of machine learning workloads, including both training and inference. The performance of ERNIE on Kunlun is three times more than that on T4 GPU.

To enable far-field voice interaction, we introduced Baidu Honghu (鸿鹄), an automotive-grade processor featuring two HiFi 4 DSPs and 100mW power consumption. This new chip can support high-accuracy voice awakening with ultra-low false alarms and offline voice recognition, making it suitable for many common scenarios, such as smart home and in-car interactions.

We have also marked our presence in quantum computing, the disruptive technology that promises to perform highly complex computational tasks far beyond the ability of classical computers. Last year we developed the world’s first quantum pulse system on the cloud, also referred to as Quanlse, to connect a physical quantum computer and quantum software by converting quantum computing software instructions (logic gates) into pulse sequences. The experimental result showed that Quanlse could achieve better performance in quantum calculation tasks than other comparable tools.

Implementing AI in innovative applications

Last year, we saw a markedly improved user experience across our products and services by tapping AI into many of our applications and technologies.

We added a full-duplex continued conversation feature to our self-developed voice assistant interface DuerOS and DuerOS-powered Xiaodu smart speakers. Users can now have seamless back-and-forth conversations with DuerOS without repeatedly awakening with “Xiaodu.”

With more users relying on Baidu translation to travel in foreign countries or attend an international conference, we realize the increasing importance of real-time translation. Last year, our research team presented the first context-aware translation model for simultaneous interpreting. The model, DuTongChuan, can achieve comparable performance to human interpreters in delivering high-quality simultaneous speech translation with low latency.

The visual perception module of autonomous vehicles (AVs) represents a critical capability that will determine whether driverless cars can meet early promises. At CVPR 2019, we introduced Apollo Lite, China’s only vision-based system that can achieve level 4 autonomous driving with only cameras. Apollo Lite can detect objects up to 700 feet away while delivering real-time, 360-degree sensing of the environment from data collected by 10 cameras with 200 data frames per second.

Simulations are also increasingly essential to training autonomous driving systems, but generating high-definition computer graphics within limited budgets remains a difficult challenge. In March, we presented our augmented autonomous driving simulation (AADS) in the journal Science Robotics. Our solution augmented real-world pictures with simulated traffic flow to create photorealistic simulation images and renderings. More specifically, we used LiDAR and cameras to scan street scenes and, from that acquired trajectory data, generated plausible traffic flows for cars and pedestrians and composed them into the background. These composite images can be resynthesized with different viewpoints and sensor models (camera or LiDAR) to simulate different use cases. The resulting images are photorealistic, fully annotated, and ready for training and testing of autonomous driving systems from perception to planning.

Open-sourcing platforms to democratize AI

At Baidu, we see tremendous potential for open-source technology to reduce the threshold for AI deployment and expand the AI ecosystem. Baidu Brain has been China’s leading AI open platform, offering over 228 AI capabilities for over 1.5 million developers. Last year, we introduced the latest Baidu Brain 5.0 with a comprehensive upgrade across its AI services from image recognition to semantic understanding. Baidu Brain 5.0 also encompasses a deep learning framework, scenario-based AI capabilities, a customized training platform, and hardware-to-software integrated modules and solutions.

In particular, deep learning frameworks have become the operating system and fuel for this new intelligent era of AI-powered innovation and growth. In 2016, PaddlePaddle opened its source code to public. PaddlePaddle is a comprehensive, robust, and easy-to-use deep learning platform that provides developers of all skill levels with the tools, services, and resources they need to rapidly adopt and implement deep learning, at scale, for continuous innovation. Today, Baidu’s self-developed PaddlePaddle is among the top three deep learning platforms in China and the top five globally, according to IDC.

The newest version of PaddlePaddle, released last November, added 21 significant new features, including the mobile inference engine Paddle Lite 2.0, four development kits, new toolkits for graph learning and federated learning, and an improved AI platform for developers without machine learning expertise to train and build custom models via a drag-and-drop interface.

Our open source autonomous driving platform, Apollo, was first introduced in November 2017. Over the past two years, Apollo has amassed 177 ecosystem partners, become the home to 36,000 developers across 97 countries, and grown to include a repository of more than 560,000 open source code.

Our voice assistant platform DuerOS now has 37,000 developers and provides more than 3,500 skills ranging from games to education to smart home devices. The number of DuerOS-controlled IoT devices has now surpassed 70 million.

Some of Baidu’s datasets are also opened to the public research community. For example, Baidu Research Open-Access Dataset (BROAD) is a set of industry-focused datasets ranging from text detection and video highlights to reading comprehension – all aimed at facilitating AI research.

The rise of IoT, 5G, and AI is driving a high demand for edge computing. Last year at the Open Networking Summit Europe, we announced the donation of BAETYL, a general-purpose platform for edge computing previously known as OpenEdge, to LF Edge, an umbrella organization within the Linux Foundation specializes in edge computing.

We also leaped forward in blockchain, announcing an open distributed ledger blockchain project, XuperChain, last year. At the heart of XuperChain is our self-developed technology, Xuper, which boasts low latency and excellent transaction capacity.

Building AI for social good

We are committed to reaching more people through inclusive and accessible technologies by leveraging the power of AI. One vision of AI innovation is to provide humans with equal access to technology and capabilities.

The standout program is Baidu Xunren, an AI-powered system to find missing persons. Our self-developed facial recognition technology can identify a missing person's face from old pictures, regardless of their age or significant changes in appearances, and trace a suitable match. Starting in 2016, we began working with multiple platforms, including China’s Ministry of Civil Affairs and non-profit organization Baobei Huijia, and helped over 10,000 missing people reunite with their families.

Last year, China stepped up its efforts on garbage sorting with the aim to improve recycling and reduce waste, which was led by major cities like Shanghai and Beijing that have enforced garbage classification programs. Within our Baidu App, we launched a smart mini program to help users classify garbage more conveniently through voice or visual search.

The pressures of Asia’s vast and dispersed population on healthcare providers have increased interest in developing strong AI solutions as a resource to meet demand and address rising costs. According to MIT Tech Review, there is a shortage in China’s available health care professionals, averaging 17.9 doctors per population of 10,000. We developed the Clinical Decision Support System (CDSS) that functions as a real-time professional assistant to doctors to guide them through standard diagnosis and treatment procedures, alerting them to potential errors and recommending suitable therapeutic plans. CDSS is currently serving 18 provinces, more than 1,000 hospitals, and more than 11,000 doctors in China – directly addressing the healthcare gap in the areas that need it most.

Each year about 115,000 children under the age of seven suffer severe-to-profound deafness, and 30,000 new babies are born with hearing impairment, according to the Ministry of Health figures from China of 2009. Last year we launched the world’s first smart app to translate picture books into sign language for hearing-impaired children.

Baidu continues to explore numerous other areas where it can apply AI for social good. For example, the recent upgrade of a new feature on Baidu Maps (a joint effort with the government) is to show the location of nursery rooms in a searched area. Other examples include applying AI to Typhoon tracking, using AI to support blind-massage parlors, and leveraging AI for the protection of Tujia language.

To conclude Baidu’s AI progress in 2019, CTO Haifeng Wang says, “in 2019, we have built a solid foundation, promoted real changes, and created a trustworthy future.” At Baidu we believe the world is now on the brink of the fourth industrial revolution because of AI, and while the journey is bound to be challenging, we’re proud to be a driving force that’s leading the next wave of innovation.