Yandex rolled out a neurobrowser, text help, video translation from Japanese and Korean, and QR code recognition | Library of Neural Networks and Promtas in Russian language

Yandex rolled out a neurobrowser, text assistance, video translation from Japanese and Korean, and QR code recognition

13.02.24

Yandex_ rolled out_neurobrowser,_text_help,_video_translation_from_Japanese

Yandex rolled out a neurobrowser, text assistance, video translation from Japanese and Korean, and QR code recognition

Yandex has released a major update for its browser, which implements many changes based on neural networks and other machine learning methods. The browser is now able to automatically correct errors in text, improve its quality, translate video from Japanese and Korean, recognize QR codes during broadcasts, offer quick navigation to links with a single click, and protect against phishing pages.

In this paper, we describe the process of training a neural network on the example of Rosenthal's textbook, how the model responsible for subtitles knows how to detect speaker changes, why not all QR codes are simply recognized, and how we managed to identify new phishing sites just a few minutes after their appearance.

One of the main new neural network features in the Browser is "Help with text" based on YandexGPT. This function checks spelling and punctuation marks. If the text does not fit into the specified format in terms of the number of characters, it shortens it, making it clearer and more structured.

Different models are responsible for different aspects of actions. In the editing mode, the model corrects errors; in the text reduction and enhancement modes, it reformulates the text. Let's take a closer look at each of them.

Error correction: When using language models to work with texts, a common problem arises: after the model processes a text with a dozen errors, we get an almost completely rewritten text. The model simply replaces the words without fixing the errors. We started by testing the hypothesis that the model can correct a text without completely rewriting it. We fed the model input texts without punctuation and capitalization - thus we provided it with the original texts. After training the model, we used a validation dataset of 100 texts for verification. The model successfully placed punctuation marks without changing the text (Levenshtein distance was zero), which was marked as a victory.

To train the neural network, a dataset of 5000 texts from publicly available sources on the Internet was collected. It was important that the texts contain various "noise" such as lively language, slang, profanity and special symbols to make the task more difficult.

Rosenthal's textbook, a key source on the Russian language, was then used to process the various examples. Spelling and punctuation in the texts were corrected, but stylistic elements such as anglicisms and pleonasm were not removed in order to train the model to distinguish spelling errors from colloquial language. Instructions were then drawn up for editors and assessors who corrected the texts accordingly.

Once the finished dataset was created, it was loaded into the model for further training to improve the quality. By running the validation dataset through the model, good results were obtained: the model corrects 97% errors and misses only one error per 5000 characters in the text. It is planned to improve this figure to 99% by analyzing and correcting the most difficult errors for the model and supplementing the dataset with relevant data.

Reduce and improve in Yandex Browser

Text reduction and enhancement models play a key role in text improvement and optimization. In text reduction, it is important for the model to be able to shorten the text without losing important facts, preserving the author's style and tone, or adding unnecessary elements. The model was trained on a marked-up dataset carefully prepared by the editors.

Several criteria are used to evaluate the performance of the text reduction model, including the level of text compression, preservation of important information, author's style and tone. In most cases, the model manages to significantly reduce the amount of text while preserving key information.

The text enhancement model is aimed at bringing the text to Russian language norms, structuring it and improving its comprehensibility. It is based on the large YandexGPT 2 model and a specially selected prompt trained on a variety of texts from the Internet. The quality of the model's performance is evaluated based on the criteria of information preservation, author's style and tone, and improvement of text comprehensibility.

Currently, text enhancement models are available to work in various text fields of websites, such as messengers, social networks, comments, mail and others. It is planned to extend the functionality of the models to work with English, add the ability to translate and change text stylistics. If you have ideas or wishes for additional functions, we will be glad to hear them.

Video: new languages, speakers in subtitles, QR code recognition

Translation Japanese + Korean in Yandex Browser

The browser continues to improve its neural network tools for video, adding new functionality and language options. Recently, Japanese and Korean video translation capabilities were introduced, making the service even more convenient and useful for users.

Japanese and Korean languages were chosen due to their popularity and demand, as well as numerous requests from users. The work to add these languages involved building a framework that allows the translation process to scale. Due to the experience with Chinese, the process of integrating Japanese and Korean proved to be faster and more efficient.

Japanese and Korean translations are currently available on YouTube, but the development team is willing to consider expanding the functionality to other platforms. If users have requests to add translation on other platforms, they can leave them in the comments for the developers to consider.

Speakers in subtitles in Yandex Browser

In addition to expanding language capabilities, the Browser team introduced a handy feature to neural network subtitles that improves the readability and accessibility of video content. Now, when several speakers are speaking, the subtitle text will be divided into lines and labeled with dashes, allowing viewers to easily identify who is speaking a given phrase. This makes the content much easier to understand and is especially important for audiences with hearing impairments, providing them with a comfortable viewing experience.

Existing models such as Multi-Voice, which provides different voices for different speakers in a voiceover, were already on board. However, adding the ability to automatically detect and display speaker changes in neural network subtitles simplified the process and improved the usability of the service. The mechanism of payline operation and integration with the Browser allows to efficiently process information about different speakers and visually highlight their utterances in the text.

Summarization from eight foreign languages Yandex Browser

The browser continues to please its users with updates, and this time the update touched upon the video summarization function. Now the application is able to summarize videos from English, German, French, Italian, Spanish, Chinese, Japanese and Korean into Russian. This allows you to get a summary or overview of the video in the desired language for a more convenient and quicker perception of information.

In addition, new platforms where the video retelling feature is available have been added. The Browser can now retell videos not only from YouTube, but also from popular platforms such as VKontakte, Zen and Rutub. This empowers users to consume video content more efficiently, allowing, for example, to watch several scientific conferences in different languages in one evening.

The process of video summarization in the Browser includes several steps that ensure high quality and accuracy of content retelling. When working with Russian-language videos, the video is first converted to text using speech recognition (ASR), then biometric analysis is applied to identify speakers and speaker changes, chunking and punctuation is performed.

For foreign videos, the process starts with similar processing of the original audio track in the foreign language. After that, the result is passed to a special model that translates the content into Russian. Thus, the Browser team has taken into account the diversity of languages and provided users with a convenient way to get compressed information from video content regardless of its language. The details of how this process works will be presented in a separate article.

Recognizing QR codes in Yandex Browser

Adding QR code recognition functionality to the Browser will greatly simplify the user experience when watching videos. QR codes are actively used in videos, and about 20% video platform users encounter them daily. Previously, desktop users experienced inconvenience when working with QR codes, having to use mobile devices or third-party services to decode them.

The process of recognizing QR codes in the Browser is done locally. During video playback, a screenshot is taken every second and processed. To optimize the work with high-quality videos, screenshots are compressed to FullHD, and for lower-quality videos the image is enhanced and detailed. The screenshots are not saved, but are immediately sent to the built-in library, where QR codes are recognized using ZXing's optensource library with some tweaks. After recognizing the text and coordinates of the QR code, the Browser displays a button and a frame around the QR code. The further action depends on the recognized text: if it is a link, the user can click on it, and if it is text, he can copy it. This is a user-friendly and intuitive solution that makes it easy to interact with QR codes in video.

Working to improve the quality of QR code recognition in the Browser has been difficult, but has yielded significant results. Initially, the use of ZXing library showed only 70% of successful recognition of "correct" QR codes created according to the standard. However, due to the improvements made it was possible to significantly increase the recognition accuracy, which helped to cope even with complex and non-standard QR codes.

The first improvement was the application of image upscaling. The first attempt to recognize a QR code used the original image, and if it failed, the image was upscaled for a second attempt at recognition. This approach increased the recognition accuracy from 70% to 76%.

The second improvement concerned the correction of an error in the library. Previously, the library repeatedly tried to recognize a QR code with different parameters, even after successful recognition. After adding a condition that after successful recognition further attempts were ignored, we managed to increase the accuracy to 90%.

The third improvement is related to the optimization of the algorithm for recognizing anchors (search markers) of QR codes. Previously, the rigid binding to square anchors was replaced by the search for similar shapes in corners, which allowed to successfully recognize even QR codes with non-standard shapes. This improvement brought the recognition accuracy closer to 100%. Thanks to these improvements, users can now confidently and conveniently work with QR codes directly from the Browser, even with non-standard variants.

Interestingly, you tested the QR code recognition algorithm on two offline datasets - one with "correct" QR codes and one with "bad" QR codes. The results showed that after the improvements, the recognition accuracy increased from 30% to 60%, which was a notable progress.

Also interesting are the observations about performance and the use of neural networks in the development process. You have realized that a complex ML model capable of recognizing QR codes may require significant resources to run, which can slow down user devices. Therefore, the choice to settle on a better QR code recognition tool without resorting to the use of neural networks seems reasonable and justified.

In addition, the solution shows that it is not always necessary to apply extremely complex technologies, and it is important to be able to choose optimal tools for specific tasks. The approach based on applying methods used in machine learning, but without overloading resources, is a good example of such an approach.

Finally, the fact that the algorithm works on all desktop platforms and can be conveniently turned on or off in the video tool settings demonstrates the flexibility and accessibility of the solution for users. This is a great demonstration of how QR code recognition functionality can be successfully and efficiently implemented in practice.

Phishing protection in Yandex Browser

Running a robot that regularly checked pages for phishing using ML models on the server was quite time and resource consuming. The observation that, on average, phishing sites are active for a few hours or a day before they hit the database is very important.

The way you implemented using the ML model in the browser to check for phishing sites on the client, then additional checking on the server taking into account various factors, seems much more efficient and faster. The ability for the ML model on the server to analyze not only the content of the page, but also additional factors such as site traffic statistics, how long ago the domain was created, and other factors, allows you to make more informed decisions about whether a site is potentially phishing.

The new solution also solves the problem of bypassing the verification of phishing sites that may have provided normal content for the robot to avoid detection. With the new approach, users can now be alerted to a potentially dangerous site, even if it was created recently and has not yet been entered into the database.

This is a great example of how applying ML models on both the client and server side can significantly improve the effectiveness of fighting phishing sites and protect users from potential cyber threats.

It's really interesting to see how the team handled the challenge of obtaining a dataset to train models on phishing and regular sites. The problem with the short lifespan of phishing sites and accessing them only to a limited audience for specific parameters makes data collection challenging and requires agility.

By using the BERT model to train on a large dataset and obtaining marked-up data to train the DSSM model, you have created a lighter and faster model that can predict phishing on the fly. This approach allowed you to make the phishing check almost instantaneous, providing security for users without sacrificing browser performance.

By accurately identifying phishing sites, the model provides protection even for less experienced users who may be more vulnerable to fraud. This is important for a wide audience, including users who are not highly technical.

The results of the work have been implemented in the browser and, as you pointed out, about 1.8 million users on the desktop version see phishing alerts every month. This demonstrates the importance and demand for security tools.

These are really impressive new features of the updated Yandex Browser! Synchronization of cloud-based tab groups between devices really makes the user experience easier and simplifies content management in the browser. The ability to select specific tabs and groups for synchronization makes this feature even more flexible and convenient.

Interestingly, Alice in the browser has now learned how to generate images using the YandexART neural network. This is a great functionality that can certainly be of great interest to users who want to quickly create unique and creative images right in the browser.

Making neural network browser features easy to use through compact menus that appear next to the content where they can be applied is a great approach. This simplifies the process of working with the features and makes them accessible and intuitive for users.

I am sure that the new features will be very useful for Yandex Browser users and will help them improve their experience of using the browser. Considering user experience and feedback will help you improve neural network models and develop new innovative features.

Download neurobrowser Yandex Browser

More in the category Yandex

Yandex

24.10.24

Yandex has released YandexGPT 4 Pro and Lite version

What is YandexGPT 4? YandexGPT 4 is a language model developed by Yandex. It utilizes artificial intelligence and machine learning technologies...

Yandex

28.05.24

"Yandex has launched the third generation of YandexGPT Lite on Yandex Cloud via APIs

Russian company Yandex has unveiled the third generation of YandexGPT Lite - a lightweight version of its generative neural network. The new model can be used in chatbots,...

Yandex

16.11.23

"Station Midi" - a new smart speaker from "Yandex" with a neuromodule

Sales of a new speaker from Yandex called Station Midi started today. This device is equipped with a neural processor, which allows it to still...

Yandex

18.10.23

Yandex unveiled its new neural network YandexART, it is designed to create images and animations

Yandex introduced YandexART (Yandex AI Rendering Technology), a diffusion neural network that generates images and animations based on...

Yandex

21.09.23

Neural network in "Yandex.Browser" began to translate any YouTube broadcasts from five languages

All YouTube channels can now use neural network-assisted translation of broadcasts in Yandex Browser. This feature is available for streams in English, German,...

Yandex

28.08.23

Yandex's neural network has reached new heights: it is now capable of creating immersive videos in Masterpieceum

In the Russian market, Yandex has become the first company to offer users the technology of creating videos with the help of neural network. This new feature is available...

Yandex

13.07.23

"Yandex" added comments to the application "Masterpiece" and launched a web version of the service

However, in the web version, you can only view and like posts for now. Users can now discuss each other's posts within...

Yandex

29.06.23

YandexGPT, Yandex's neural network, has increased the speed of text generation by a factor of five and begun formatting

Yandex" reports on the improvement of the neural network YandexGPT, which can now provide answers to queries several times faster. Earlier...

Yandex

27.06.23

YandexGPT is now able to summarize the content of texts, which makes it easier to work with large amounts of information

YandexGPT is a generative neural network developed by Yandex that is able to summarize the content of texts by highlighting their main points. The neural network is trained...

Yandex

15.06.23

The YandexGPT neural network (YaLM 2.0) has been integrated by Yandex into the Masterpiece app, where it is now being used

Yandex has integrated its new YandexGPT neural network (YaGPT or YaLM 2.0) into an image creation application called "Masterpiece."....

Yandex

05.06.23

The YandexGPT (YaLM 2.0) neural network has learned to remember dialogs with users

The YandexGPT neural network (YaLM 2.0) has moved to a new level. It can now remember previous lines and maintain the context of the conversation, which allows...

Yandex

17.05.23

Alice became even smarter: Yandex added a new neural network YandexGPT

The new option is already available in the Yandex app, Yandex Browser, Yandex Stations and in smart TVs with Alice.