Fortune Telling Collection - Comprehensive fortune-telling - ChatGPT fortune-telling principle _ fortune-telling schematic diagram

ChatGPT fortune-telling principle _ fortune-telling schematic diagram

Caht gpt full name

Caht gpt full name: chat generation pre-training converter

1 Introduction. ChatGPT chat robot

ChatGPT is an artificial intelligence chat robot program developed by OpenAI, which was launched in June 2022165438+1October. The program uses a large language model based on GPT 3.5 architecture and is trained through reinforcement learning.

ChatGPT still interacts with words at present, but it can be used for relatively complex language work, including automatic text generation, automatic question and answer, automatic summarization and so on, in addition to human natural dialogue.

For example, in automatic text generation, ChatGPT can automatically generate similar texts (scripts, songs, plans, etc. ) automatically generates answers according to the input text, while in automatic question answering, ChatGPT can automatically generate answers according to the input questions. It also has the ability to write and debug computer programs.

During the promotion period, everyone can register for free, and use ChatGPT to talk to AI robots for free after logging in.

ChatGPT can write articles similar to real people's level and get attention quickly because it gives detailed answers and clear answers in many knowledge fields, which proves that it can also be competent for knowledge-based jobs that were previously thought not to be replaced by AI, and has a considerable impact on finance and white-collar labor market, but its uneven factual accuracy is considered as a major defect.

It is based on the result of thought model training and is considered to need serious revision. After the release of ChatGPT in June 2022, the valuation of OpenAI has risen to $29 billion [7]. Two months after the launch, the number of users reached 654.38+0 billion.

2. How does 2.ChatGPT train data?

ChatGPT uses supervised learning and reinforcement learning based on human feedback to fine-tune GPT-3.5. Both methods use human trainers to improve the performance of the model, and enhance the effect of machine learning through human intervention, so as to obtain more realistic results.

In the case of supervised learning, the model provides a dialogue, in which the trainer J plays the role of user and AI assistant. In the reinforcement step, the human trainer first rates the response of the model in the previous dialogue.

These levels are used to create a "reward model", which is further fine-tuned by iterative approximate strategy optimization (PPO).

This strategy optimization algorithm is more effective than the trust region strategy optimization algorithm. These models were trained in cooperation with Microsoft on Microsoft Azure supercomputing infrastructure.

In addition, OpenAI continues to collect data from ChatGPT users, which can be used for further training and fine-tuning ChatGPT. Allow users to vote for or against the reply they received from ChatGPT; When voting for or against, they can also fill in additional feedback in the text field.

ChatGPT training data includes all kinds of documents and all kinds of knowledge about Internet and programming languages, such as BBS and Python programming languages.

As for the training of ChatGPT's ability to write and debug computer programs, the deep learning model, like all other language models based on deep learning, only gets the statistical correlation between code fragments.