Fortune Telling Collection - Comprehensive fortune-telling - When the GPT encounters autonomous driving, the first DriveGPT will start.

When the GPT encounters autonomous driving, the first DriveGPT will start.

What does GPT mean for autonomous driving?

Wenshu Home Wang Huang Hua Dan

ChatGPT brings fire AI, so what kind of chemical reaction will happen when GPT encounters autonomous driving?

The full name of GPT is generative pre-training transformer, that is, generative pre-training transformer. Simple summarization is a deep learning model of text generation based on data training available on the Internet.

April 1 1 day, at the 8th Millicent AI? On, CEO Gu officially released the technology-based DriveGPT, whose Chinese name is Snow Fox.

What can DriveGPT do? How was it built? Gu is in Ai? I made a detailed interpretation that day. Besides, AI? DAY also showed the upgrade of MANA, a milli-terminal autopilot data system, mainly because of its progress in visual perception.

0 1.

What is DriveGPT? What can be achieved?

Gu first explained the principle of. The essence of the generative pre-training transformer model is the probability of solving the next word. Each call samples from the probability distribution and generates a word. In this way, a series of characters can be generated for various downstream tasks.

Taking Chinese natural language as an example, a single word or word is a token, and there are about 50,000 Chinese token words. When the token is entered into the model, the output is the probability of the next word. This probability distribution reflects the knowledge and logic in language. When the big model outputs the next word, it is the result of linguistic knowledge and logical reasoning, just like reasoning who the murderer is according to the complicated clues of detective novels.

As a large-scale vehicle suitable for autonomous driving training, DriveGPT Snow Fox Hai Ruo has three capabilities:

1. Many such scene sequences can be generated by probability, each scene is a global scene, and each scene sequence is an actual situation that may happen in the future.

2. When all scene sequences are generated, we can quantify the most concerned behavior trajectory of the vehicle in the scene, that is, when the scene is generated, the future trajectory information of the vehicle will be generated.

3. With this trajectory, Hai Ruo of DriveGPT Snow Fox can output the whole decision logic chain while generating the scene sequence and trajectory.

In other words, with the learning tiger Hai Ruo of DriveGPT, planning, decision-making and reasoning can be completed under a unified generation framework.

Specifically, the design of DriveGPT Snow Fox Hai Ruo is to tokenize the scene and call it Drive? Language.

Drive discretizes the driving space, and each Token represents a small part of the scene. At present, Millie has about 500,000 token thesaurus spaces. If a series of scenario token sequences that happened in the past are input, the model can generate all possible scenarios in the future according to history.

In other words, Hai Ruo is also like a reasoning machine. Tell it what happened in the past, and it can infer many possibilities in the future according to the probability.

Together, a series of Token is a complete time series of driving scenes, including the state of the whole traffic environment and the state of your car at a certain moment in the future.

Belt drive? Language, you can train DriveGPT.

The training process of DriveGPT begins with a large-scale pre-training based on driving data and previously defined driving attempts.

Then, through the scene of taking over or not taking over during use, the pre-training results are graded and sorted, and the feedback model is trained. In other words, replace the wrong automatic driving mode with the correct human driving mode.

The follow-up is to continuously optimize the iterative model with the idea of reinforcement learning.

In the pre-training model, the GPT model with decoding structure is adopted, and each token is used to describe the scene state at a certain moment, including obstacle state, self-driving state, lane line and so on.

At present, the millimeter pre-training model has 65.438+0.20 billion parameters, and the driving data of 40 million production vehicles can be used to generate various scenes.

These generated results will be optimized according to people's preferences and weighed in the dimensions of safety, efficiency and comfort. At the same time, Millie will train the feedback model with some screened human data, about 50,000 fragments, and constantly optimize the pre-training model.

When outputting the decision logic chain, DriveGPT Snow Fox Hai Ruo uses the prompt technique. The input terminal gives the model a hint, telling it "where to go, slow down or hurry up, let it reason step by step". After this prompt, it will produce results in the expected direction, and each result has a decision logic chain. Every result will also have the possibility of appearing in the future. So we can choose the most likely and logical chain-driven strategy in the future.

A vivid example can be used to illustrate Hai Ruo's reasoning ability. Assuming that the model is prompted to "reach a certain target point", DriveGPT Xuehu in Hai Ruo will generate many possible driving modes, some of which are radical, and will continue to change lanes and overtake quickly to reach the target point, some of which are steady and follow the car to the finish line. At this time, if there are no other additional instructions in the prompt, DriveGPT Snow Fox Hai Ruo will optimize the effect according to the feedback training, and finally give an effect that is more in line with most people's driving preferences.

02.

How did you realize DriveGPT?

First of all, the training and landing of DriveGPT Snow Fox Hai Ruo can not be separated from the support of computing power.

June 5438+this year 10, Mo Hao and Volcano Engine jointly released their self-built intelligent computing center-MANA OASIS in Xuehu Oasis, Mo Hao. OASIS has a computing power of 6.7 billion times per second, a storage bandwidth of 2T/ s and a communication bandwidth of 800G/ s..

Of course, computing power alone is not enough, but also needs the support of training and reasoning framework. So, Millie also made the following three upgrades.

The first is to ensure and improve the stability of training.

Large-scale model training is a very difficult task. With the increase of data scale, clustering scale and training time, the small problem of system stability will be infinitely magnified. If not handled, training tasks will often go wrong, leading to abnormal interruption and wasting a lot of resources invested in the early stage.

On the basis of large-scale model training framework, Millie and Volcano Engine jointly established a complete set of training support framework. Through the training support framework, Millie has realized the minute-level capture and recovery ability of abnormal tasks, which can ensure the continuous training of kilocalorie tasks for several months without any abnormal interruption, effectively ensuring the stability of the large-scale model training of DriveGPT Snow Fox in Hai Ruo.

The second is to flexibly schedule the upgrade of resources.

Millimeter has a huge amount of real data brought by production cars, and can use the returned data to automatically learn the real world. Due to the huge difference in the amount of data sent back at different times every day, it is necessary for the training platform to have flexible scheduling ability to adapt to the size of data.

Finally, the incremental learning technology is extended to large-scale model training, a large-scale model continuous learning system is built, a task-level flexible scheduler is developed, and resources are scheduled in minutes, and the utilization rate of cluster computing resources reaches 95%.

The third is the upgrade of throughput efficiency.

In the aspect of training efficiency, in the large matrix calculation of transformer, by splitting the data of internal and external circulation, the data is saved in SRAM as much as possible, which improves the calculation efficiency. Under the traditional training framework, the operator process is very long. With the introduction of Lego Library provided by Volcano Engine, the end-to-end throughput is increased by 84%.

With the upgrade of computing power and these three aspects, DriveGPT Snow Fox Hai Ruo can get better training and iterative upgrade.

03.

Mana upgrade, camera instead of ultrasonic radar

The fourth AI in 202 1 and 65438+February? MANA is an intelligent system of autopilot data, which was released on 1 1 month 15. After more than one year of application iteration, MANA has now ushered in a comprehensive upgrade.

According to Gu, the upgrade mainly includes:

1. Large-scale model capabilities related to perception and cognition are integrated into DriveGPT.

2. Computational basic services are optimized for large-scale model training in terms of parameter scale, stability and efficiency, and integrated into OASIS.

3. The data synthesis service using NeRF technology is increased, which reduces the collection cost of corner case data.

4. Aiming at the problem of multi-chip and multi-model rapid delivery, the heterogeneous deployment tools and vehicle adaptation tools are optimized.

We have introduced the related contents of DriveGPT in detail. Let's take a look at MANA's progress in visual perception.

Gu said that the core purpose of visual perception task is to restore the dynamic and static information and texture distribution in the real world. Therefore, Millimeter upgraded the framework of the visual self-monitoring model, and integrated the three-dimensional structure, velocity field and texture distribution of the prediction environment into a training target, so that it can calmly cope with various specific tasks. At present, the data set of millimeter vision self-monitoring model exceeds 4 million segments, and the perception performance is improved by 20%.

In the parking scene, millimeter realized the parking requirement by using the pure vision ranging of fisheye lens. The measurement accuracy can reach 30cm in the range of 15m, and the accuracy within 2m is higher than 10cm. Using pure vision instead of ultrasonic radar further reduces the cost of the whole scheme.

In addition, in the aspect of pure visual 3D reconstruction, through the large-scale model technology of visual self-monitoring, a large number of mass-produced returned videos can be transformed into 3D labeled real data that can be used for BEV model training, without relying on lidar.

By upgrading NeRF, the reconstruction error can be less than 10.

This article comes from the author Zhijia. Com, the copyright belongs to the author. Please contact the author if reproduced in any form. The content only represents the author's point of view and has nothing to do with the car reform.