The latest version of our powerful language model YandexGPT 2 was presented at the Practical ML Conf conference. We are delighted to announce that it is already successfully functioning in Alice's skill called "Let's make it up". The new model helps the user to structure information, generate ideas, create texts and much more.
It is important to note that this version of the model outperforms the previous one in 67% cases and in a number of scenarios turns out to be even more efficient. All of this was made possible by improvements made at all stages of model training, but the key factor is the new pretrain method. Let me briefly summarize what changes have taken place in the model training process, in which scenarios they have had the greatest effect, and what future plans we have for the model.
In what scenarios is the new model particularly useful? Let's start by explaining how models compare to each other. The same model may be strong in one scenario but lose in another.
So how can we determine whether a model is smarter overall? We decided to tackle this problem in the following way: we collected 500 as many different examples of user tasks as possible. We then provided these examples to both the old and the new model and counted how many times the new model's answer was better than the old model's answer. If the new model outperformed the old model in more tasks, we considered it smarter. YandexGPT 2 outperformed the previous version of the model in 67% cases.
Thus, it is safe to say that the new model demonstrates a higher level of intelligence and performance.
On specific slices of scenarios that are popular among users, the new model also shows its effectiveness. To get a more accurate picture of its behavior in different scenarios, we divided the same 500 example tasks into separate groups corresponding to different scenarios, and measured how the quality of the model changed in each of these groups. This allows us to better understand where the new model performs best.
Analysis of the results showed that the new model shows significant quality improvement in many popular scenarios. For example, in scenarios related to text generation and idea generation, it showed significant improvement in accuracy and superiority compared to the old model. In addition, in scenarios related to information structuring, the new model also showed significant improvement.
These results confirm that the new YandexGPT 2 model has advantages in many popular usage scenarios. It is able to better handle user queries, structure information, and generate high-quality content. This makes the model even more useful in a variety of applications, from answering user questions to creating texts and ideas. We continue to work on improving the model and exploring its applicability in different scenarios to provide users with an even higher level of quality and meet their needs for information sharing and creativity.
Examples of different usage scenarios are given, each of which achieves a certain level of successful text processing.
Changes in the learning process of a new model can be divided into two main stages: pretrain and finetune. In the first stage, the neural network expands its erudition and improves its general knowledge about the world, language and various tasks, while in the second stage it learns how to fulfill specific queries, follow the format and style of responses. In a previous article about starting YaGPT in Alice, these stages have already been mentioned. The main thing to remember is that improving one stage will not solve the problems associated with the other stage.
In the previous post about the launch of the first model, we focused on the data collection process for the feintune. Now I will talk in more detail about the pretrain phase.
The challenge of pretrain is to incorporate all useful knowledge available on the Internet into the model. However, the most difficult part of this process is selecting the most useful data for training from the endless stream of information. How can we determine if the dataset improves after each new added chunk of data? Completely re-training a large model for each change in the dataset and measuring its quality are extremely costly and time-consuming processes. It would slow our progress to a snail's pace. Therefore, we chose a more realistic approach: we accumulate changes in the dataset and only then retrain the model. However, there is a risk that the chosen direction of changes may not evolve correctly and lead to a decrease in model quality. Previously, we manually tracked changes and even developed tools to manually search for information during the pretrain process. Collecting the dataset was like an art form. As the dataset grew, it became increasingly difficult to manually find problems. So we took a different approach.
We vetted a lot of ideas and chose the ones that really benefit the development of our model.
1. We trained a classifier to identify low-quality text. Now our model is able to recognize texts with encoding errors, problems in HTML markup, repeated sentences and other such problems.
2. We also trained a classifier to identify useful text. Our model can now distinguish texts that may look good, but do not carry real usefulness for users. We determine the usefulness of a text based on whether it contains answers to real queries of Yandex Search users.
3. we have been actively working on increasing the share of highly cited texts. This helps us to improve the quality of information we provide to users.
4. We have significantly improved the deduplication algorithm, which reduces the number of repeated texts to less than 0.5%. This improves the quality and variety of content that is provided to users.
5. We developed a separate tool to evaluate "factual completeness". We took real factual queries from Yandex Search and measured the fraction of such queries that our model can answer using pretrain. We were able to increase this share from 70% to 80%, which is a significant improvement.
All these changes contribute to improving the quality of the model and enriching the information provided to users. We continue to research and work on various areas to further improve the usefulness and accuracy of our model.
The new model is already available in the "Let's invent" skill in the Alice voice assistant. You can use it on Yandex Station devices, Alice-enabled TVs, in the Yandex app, Yandex Browser, on the search results page, and on ya.ru. In addition, in Search you can now expand the chat window with neural network to the full screen for more convenient work.
As for future plans, we will continue to improve the quality of datasets for pretrain and faintune, as we continue to see the positive effects of using high quality training examples. We are also working on implementing RLHF (Reinforcement Learning from Human Feedback), but this phase is still to come. Of course, we will continue to integrate the YaGPT model into various Yandex services, but only where it will be really useful and beneficial for users. We continue to develop and strive to make the use of our model as useful and convenient as possible for all users.
Ailib neural network catalog. All information is taken from public sources.
Advertising and Placement: [email protected] or t.me/fozzepe