During the AI Journey conference, Andrey Belevtsev, Senior Vice President, CTO and Head of Technology at Sberbank, announced that Sber's developers are working on a new version of the GigaChat service. This version will be based on one of the most advanced models for the Russian language, containing 29 billion parameters.
According to him, thanks to the new LLM model that will be used in the next version of GigaChat's artificial intelligence system, the service will have the same capabilities as popular foreign solutions.
"Training the models on which GigaChat is based is a large-scale and complex computational project that we have not encountered before. The total number of computational operations is almost 6 times the number of operations performed in training the ruGPT-3 model with 13 billion parameters in 2021," Belevtsev said.
He pointed out that a unique data set has been created for GigaChat and is constantly evolving, with a large number of Sber employees working on it to help improve the quality of responses in various areas.
"Thanks to the efforts of these experts, with each new release of GigaChat, users are getting the most out of the service for their tasks," said a senior executive of the company.
Sber emphasized that thanks to the new LLM model, GigaChat follows instructions better and is able to perform more complex tasks. The quality of summarizing, rewriting and editing texts, as well as answering various questions, has improved significantly. The team compared the responses of the new and previous models and recorded an overall quality gain on the 23%. It is important to note that the new model handles factual writing 25% better than the previous version.
To achieve these results, many experiments were conducted to improve the model and increase its training efficiency. In particular, a framework was used to train large language models with the distribution of neural network weights across video cards, which reduced memory usage.
According to an internal evaluation in the MMLU (Massive Multitask Language Understanding) benchmark, the new GigaChat with 29 billion parameters outperforms the most popular open source counterpart LLaMA 2 34B.
Sber's business clients will soon have access to the API of the new model to implement their own solutions, and members of the academic community will be able to use it to conduct research.
Sber's eighth international conference, AI Journey, began on November 22 and will continue until November 24.
Ailib neural network catalog. All information is taken from public sources.
Advertising and Placement: [email protected] or t.me/fozzepe