That it analyzes not only the page title, but also the entire content of the page before showing the user the results of the query.

Only a week has passed, it’s too early to draw conclusions, but, nevertheless, we asked representatives of the SEO community about their expectations from the new algorithm and changes in the work of SEO masters.

Kirill Nikolaev, technical director of the WEBLAB studio:

Immediately after Palekh’s release, the further vector of development of the new algorithm was clear. Just before the announcement, there were heated debates about what would happen (the most popular option: the first page - online direct), but deep down we all knew what to expect. Yandex has increased the numbers, and this is good news. If earlier there were 150 documents in the RAM for popular requests, now their number has exceeded 200,000, with the help of special agents from Toloka very diligently. To be in the top of these 200,000, you need to have good behavioral (which is logical) and similar semantics, which makes you think that the days of content theft are returning with renewed vigor. And the times of long sheets in online store catalogs also tell us “Hello, Andrey!”

However, as far as I know, the matrix for popular queries is stored in the database for the duration of the next update, while the matrix for low-frequency/low-frequency queries is generated on the fly.

You shouldn’t expect such colossal changes to occur in the search results as after the launch of Snezhinsk in 2009, so we’ll leave HF/MF queries alone and talk about more mundane things.

Personally, what interested me most was this statement:

"...the new algorithm not only compares the text of a web page with the search query, but also pays attention to other queries that bring people to that page." For the industry this could mean two things:

1. Good: an even more careful selection of semantics, even more diligent clustering will bear fruit. Text factors become leaders in importance and relevance; work becomes more difficult, but the result gets better.

2. Bad: the maximum amount of text for indexing one document is 32 thousand characters. So I can expect that now, under the catalog description in some store, you can read short stories about water delivery, in which there is a beginning, development, climax, denouement and epilogue. This makes sense because it is the simplest method to extend the semantics. Of course, I’m exaggerating, because it’s clear that the TOP is formed somewhat differently, but I very much suspect that our “content kings” will perceive it that way.

Well, in addition, I’ll just throw out a thought: what if you don’t rewrite texts and don’t waste time on complex things, but try to competently generate traffic for low-frequency/low-frequency queries? An interesting field for experimentation.

Yandex is developing and we are growing with it.

This is cool. I would like to imagine what it will be like in 5-10-15 years.

If you want to be a good SEO, learn hardware.

This is wonderful. I look forward to new courses from BM on the topic “Semantic vectors for business.” But seriously, the profession is becoming more difficult, which is good news. I hope that very soon the individuals who do link runs on databases compiled according to “the latest research by Dmitry Shakhov (a well-known SEO practitioner)” will disappear.

Even deeper dive into text factors

More than the courses from BM, I am only waiting for the launch of neural network selection of semantics and clustering from Chekushin. And a course from Devaki, of course.

Alaev Alexander, director of the web studio "Alaich and Co":

Yandex rolled out a new algorithm with great noise and excitement. I thought that my life as an optimizer would change irrevocably, but... I hasten to reassure everyone - nothing has changed!

The new Yandex algorithm is aimed at improving results for “long” information queries (such queries are typical for voice search). Yandex with its neural network began to understand and search by meaning, that is, not only by keywords, but also by what they mean. A continuation of “Palekh,” which searched for meanings only by document headings, “Korolyov” searches for meanings throughout the entire document. But you already know this if you watched the presentation or read publications based on it.

Let's talk about how this will affect the lives of webmasters and SEOs. I repeat - no way. The new algorithm will not affect commercial requests in any way. If a person wants to buy or order something, then he certainly knows what it is. And even if he doesn’t know, then anyway, meaning a laptop, he won’t ask “a compact desktop computer consisting of two halves,” but will first find out what this thing is called.

Korolev should respond positively to quality information pages. But generated, synonymized and similar texts should lose traffic, if they had any at all. I think that rewriting and copywriting, written without immersion in the topic, can also suffer, giving way to higher quality texts, albeit without using the necessary ones keywords in the required quantities.

As you can see, I didn’t say anything new, but sites, as before, need to be made of high quality for people!

Alexander Ozhigbesov, project managerozhgibesov.net:

By introducing new algorithms into search, Yandex is taking small but confident steps towards understanding the meaning of the request and finding the same meaningful answer - this is how Palekh and Korolev present to us in the company. In fact, this is a copy of the Google algorithm - Hummingbird, which was launched in 13, however, it is worth realistically assessing the company’s available power. Yandex cannot provide answers to all unique requests tomorrow and rebuild the search results. The company’s mistake is that they presented the algorithm as something new, although Google did it earlier and without such “Russian” pathos, but this is certainly a strong achievement and I am sure that in the future neural networks will be able to show us the ideal search, if At that time, the domestic search engine will not make the entire first page of search results paid. But this is also not particularly scary, let’s redistribute priorities and scale the semantics for the context.

What changes await optimizers and will there be any at all?

There are currently no special changes in popular e-commerce after Palekh and Korolev. While Yandex is testing its algorithms on the MNC, one cannot expect drastic changes in the companies’ regulations and methods. Here SEOs don’t have much to worry about; long and unique queries are present only in information topics and complex commercial services. But Palekh and Korolev’s task is not to replace the current ranking parameters; they are trying to give meaningful answers to complex queries, so searching for queries like “Red dress with panties visible” is not at all necessary. My personal opinion is that I have and will continue to strive for high-quality writing and structuring of content, subsequent analytics and additional optimization, so the algorithm will not cause damage to serious commercial projects, as was the case, for example, with Minusinsk.

Hello, dear readers of the blog site. I apologize that some posts are published over a long period of time, but I launched several more projects that suddenly rose to the TOP in 1.5 months, using my knowledge in the field of blogging (if anyone needs advice, write in a personal message). I have to be torn between projects and building a house for my family.

Today we will touch on the new Korolev algorithm from Yandex and try to compare it with its predecessors. Personally, it didn’t have much impact on my blog, except that useful and voluminous articles became even higher in the TOP. Well, let's take a closer look at everything in the article and draw the necessary conclusions after observing this algorithm.

Korolev Yandex algorithm - what it is and how it works

At the end of August 2017, a new Yandex Queen algorithm was released. The news about the update in the search engine immediately attracted the interest of SEO specialists and the media.

Main feature The Queen is to increase the speed of information processing and improve the quality of semantic analysis of the text.

The speed of data processing has increased several thousand times. Palekh used 150 documents to form the TOP. Now more than 200,000 articles are compared with each other. This result was achieved by optimizing the ranking protocol.

To understand the new algorithm, we need to go back a step to Palekh. His presentation was held on November 2, 2016. Statistics showed that the largest portion of search phrases were low-frequency phrases tailored to the only correct answer. This part falls on the bird's long tail.

To give the desired answer, the client must have associative thinking and self-learning skills, like a person. Neural networks are best suited for such tasks, which is why they became the basis of the new algorithm.

The main goal of "Korolev"

If you want to find a specific object, a person begins to describe its properties; these are features of associative thinking. If we forgot the name of the video, then we begin to say what was contained inside: “a film about girls during the war” or “a film about a creature with a tail and wings.” In the first case, Yandex provides “And the dawns here are quiet”, in the second option we get “chimera”.

Yandex improves the quality of comparison of multi-word phrases. The program analyzes the connection between each word in a sentence and builds a unique association with multiple answer options. Just like the human brain does.

What's new?

Innovations:

  • semantic vector for all content, not just the title;
  • comparison of more than 200,000 articles when forming search results;
  • user behavior on the page is taken into account;
  • people help train the system.

Korolev analyzes not only the title, but the entire content (including photos, videos, tables, etc.) and composes a semantic vector based on it.

The main innovation was the multiple acceleration of search methods. In the past, the semantic vector was built at the moment the phrase was entered into the search bar. This method heavily loaded the servers and delayed the speed of response.

When you send a search phrase, its semantic vector is compared with the array already recorded in the database. Palekh compared about 150 options, but the new version analyzes more than 200,000 articles at a time. This increases the chance of finding the desired answer.

Yandex neural network: operating principle of the Korolev neural network + examples

The main feature of a neural network is the ability to self-learn. Work is carried out not only according to deliberate formulas, but also on the basis of previous experience and mistakes.

The human brain is a huge neural network with associative thinking, while computers try to emulate human behavior by recreating architecture neural networks.

Features of the neural network structure

A neural network is a set of single neurons, each of which stores or processes information. Each of the neurons is capable of receiving, processing and transmitting signals. The input data stream is gradually processed from one neuron to another and in the end the desired result is obtained.

Artificial neural networks transmit conditional weights—numbers from 0 to 1—to each other to determine how well a particular version of the incoming information corresponds necessary information. After the analysis is completed, the neuron with the highest weight is considered the most suitable to answer the question.

The diagram depicts a neural network. The first two layers do the processing. Each of the neurons contains a specific function that receives input data and, after processing, produces the necessary response. This is how semantic vectors are compared.

Semantic vectors

Computers cannot operate with words or pictures, so they use arrays of numbers to compare information with each other. Search engines must independently determine the main topic and idea of ​​the text in order to give the user what he needs.

How are vectors similar? the question asked and text, the higher the article is in search priority. Korolev uses analysis of all content:

  • tables;
  • text;
  • photo;
  • video;
  • headers;
  • quotes;
  • lists;
  • emphasis (italics, bold, etc.).

The quality of vector construction increases several times due to the conversion of more information.

To create vectors, a neural network is used, the text is passed through a sequence of neurons, and as a result, an output three-hundred-dimensional array of numbers is obtained. Subsequently, it is entered into a single database and used for comparison.

Education

The main feature of neural networks is learning ability. Unlike standard algorithms neurons are able to remember their previous experiences and learn from them. The computer is getting better and better at distinguishing information each time.

In the past, training was carried out by company employees, their task was to navigate through millions of requests and change issuance priorities at their discretion. Then the developers created the Yandex.Toloka application, it is a list of simple tasks. You need to go through queries and evaluate the quality of search results. For each task they pay about 0.1-1$

What content does the new search algorithm think is good?

The most suitable article for the TOP search results will be the one containing the maximum useful information for the user and corresponding to the request. Therefore, it should cover all sorts of client questions section by section.

In Korolev, user behavior on the page is taken into account as a priority. Therefore, the task of administrators is to try to retain the user and interest him. To do this, use structured headings, tables, lists, highlights, photos and videos.

New search priorities

SEO specialists, after the release, conducted a study to evaluate changes in ranking priorities. No significant changes were observed; priorities remain:

  • text structure;
  • completeness of the topic;
  • content reading prostate;
  • correspondence of headings to the semantic content of the text;
  • correct formation of the semantic core.

The main thing is to write for living people; this priority remains the most important.

Why Yandex launched a new search algorithm and how it threatens sites

Any company strives to make its products the best in the service market. In this case, Yandex's biggest rival is Google. Innovations were created for the following purposes:

  • improving the quality of search on non-standard issues;
  • attracting new investors;
  • increase in ranking productivity (more than 200,000 articles when generating results).

The main goal was to improve the quality of delivery. In addition, it was necessary to show investors that work in the company was in full swing and their money was used for its intended purpose. Innovations were subsequently used to create voice assistant"Alice".

Line of previous algorithms

To better understand new technologies, we need to go back to the past. In this case, we will consider the line of previous algorithms that were used by the search engine for ranking.

At first, the Internet contained only a couple of thousand sites; to find the desired article on them, it was enough to compare the keywords of the search phrase. Subsequently global network has grown exponentially, now on one topic you can find more than hundreds of thousands of similar sites with a million articles.

Therefore, it was necessary to complicate the ranking systems and began to take into account the following additional parameters:

  • number of referring materials;
  • uniqueness of content;
  • client behavior on the page.

Matrixnet

In 2009, Yandex faced a problem that articles increasingly did not answer user questions. To fix this error, it was necessary to teach the server to make decisions independently and learn on its own.

A complex mathematical formula with many parameters was invented to determine whether text matches a search phrase.

But the following problems remained:

  • search depends on words;
  • auxiliary materials (photos, videos, quotes, etc.) are not taken into account.

The main problem was that it was not always possible to fully describe the meaning of the article in one title. Quite often, the article does not contain specific keywords, but at the same time it fully reveals the topic and gives a detailed answer to the user’s question.

Palekh algorithm

In 2016, a neural network computer model was used in the ranking system. The main feature of this approach is that the computer is now able to remember its mistakes and learn from its own experience.

In the same year, semantic vectors were introduced. The title of the article was passed through a neural network and decomposed into many vectors. Now computers compared not words from the search, but multidimensional arrays numbers and vectors. We managed to move away from direct dependence on the number of certain words in a phrase, and give priority to the semantic content.

One of the shortcomings remains the problem of low speed. To create the search results, only 200 of the most relevant articles were compared. Therefore, it was difficult for the system to find multi-word semantic phrases like “a film about a girl, a spy who runs away and goes to school.”

Yandex Korolev algorithm

In the latest innovation, we primarily optimized the neural network and improved the productivity of text processing. Now the vectors are compared in advance in offline mode, thanks to this it has been possible to increase the effectiveness of the search.

Yandex independently collects statistics on user interest and uses them to create pre-prepared search results.

Thanks to optimization, a semantic vector is compiled not only for headings, but for the entire content. It is possible to find a maximum of semantic connections between words.

Threats to websites

In general, no dangers have been created for the sites and the conversion statistics do not change much. First of all, the innovations will affect information blogs, forums and sites with films.

Websites that do not meet the interests of the user may fall from their leading positions. For example, the title is “homemade apple juice,” but the article discusses methods of growing trees, pancakes with jam, and a completely different text.

Don't forget to repost and subscribe to the blog newsletter. All the best.

All the best, Galiuin Ruslan.

1. Users

From the point of view of users, it can be strange that queries that are identical in meaning but differ in spelling produce different results. Many users submit queries to a search engine as if they were asking a friend. The new algorithm will make it easier to answer these requests.

2. Webmasters

In an ideal world, webmasters would make good products, create quality content, and not think about specifically promoting their site in search engines. In reality, they often have to adjust texts and the site itself for search engines.

3. SEO specialists

Some of the methods that helped promote websites before (for example, SEO copywriting) will no longer give such an effect. Of course, there will be attempts to outwit the new algorithm, but part of the effort will be aimed at creating quality content.


It’s too early to judge the quality of the new algorithm, but the more answers it gives, the better it will become. Therefore, in the long term, users should feel the difference.

What kind of machine is this?

With the introduction and growth of the IQ level of neural networks in search engines, the quality and relevance of the content returned will increase exponentially. The machine can analyze visual content and understand the meaning of words and expressions.


An attempt to weave into pages any popular news items that do not have a direct semantic relationship to the topic of the resource’s niche will lead to exclusion from the search results.

Advantages

The key advantage of a neural network is not that it can analyze, but that it can learn and remember. That is, resources that, at the choice of users, do not meet the expectations of the search results will also gradually drop out of the search results.

That is, the machine records that for request A, the relevant number of users always click on resource B and never click on resource D. Resource D will be excluded from the niche of matching request A.


Let's wait a few weeks and we'll see

On the one hand, the name is not as fucking as “Palekh”. And that's already good. On the other hand, everyone has not yet had time to perfectly adapt to “Palekh”, when here comes a new, more twisted algorithm that focuses more and more on content.

Content is a king – confirmed after each update

Of the advantages, it is obvious that this provides an increasing opportunity for progressive, savvy and new sites to compete in saturated niches with long-established search results leaders, as well as sending everything into the more distant astral plane of thoughtless SEO copywriters who made a garbage dump on sites by listing anchors in texts.

The algorithm provides the opportunity for new professional growth for copywriters with a head, they can do something more useful than write posts for social networks.

But, from a skeptical point of view, it is unlikely that Yandex will miss the moment to promote its commercial capabilities and their necessity, in particular the context.


I always perceive such news very positively. Since in addition to SEO optimization, you have a large area for content strategic actions, and this brings SEO to new level. They stop treating him as something strange and incomprehensible. In the form people are familiar with, SEO is fucked up, it used to be like that, but time passes, and the outdated perception remains.

The logic is this: Previously, there were many web studios on the market that did SEO, and some simply pretended to do it, but took a budget for it. The latter predominated in number. This is why there is an opinion that SEO is a scam. Time passes, each update of the algorithm displaces those who “pretended”, and the outdated perception of people still remains.



The new Queen algorithm logically continues the changes in Yandex search in recent years. Greater emphasis on neural networks, analysis of the entire content of the page, and not just the headings.


A very important point is the analysis of other search queries that bring users to the current page, which allows you to more accurately determine the relevance of the content and the relationship between search queries.

To summarize: search quality will improve. And that's great.

Goodbye SEO texts

It will be interesting to see how the new algorithm will perform in real life. It takes time to evaluate both the adequacy of semantic output and ranking priorities.


Definitely, search will now have to do a better job of handling non-standard and rare queries if the network really sees more meaning behind keywords. I really hope so, because this is another step towards “Goodbye, SEO texts.” However, the network will need to be trained. This doesn't seem to be a joke.

I just tried the search for "Movie Boy with a Scar on His Forehead" and got a lot of references to the movie "Scarface" in the search results. That is, keywords still triumph over meaning.

And only if I find the Harry Potter pages I need in the search results and spend a significant amount of time on them, the machine will understand what meaning I put into the request and will clarify the search results for the next time. At least that's how it should be. The learning process will not be quick, but in any case it is a good step into the future.

A little closer to business...

Today, in response to the request “Buy a cabinet with a sliding door,” I persistently receive ovens and a bunch of unnecessary things (louvered, hinged, and so on).



The essence of the algorithm is to determine additional properties of the document at the URL indexing stage, expressing in numerical form the correspondence of the page text to previously known and frequently used phrases. It is stated that the innovation will affect low-frequency queries, which make up about a third of the search results.


Due to the lack of statistics on such “rare” queries, the quality of search for them suffers. In fact, this algorithm will pull out of oblivion documents that do not directly contain a long query, but are close in meaning to the user’s query.

It is important for marketers and SEO specialists that their optimized sites compete not only with each other, but also with with sites that have not been touched by the optimizer at all.

Of course, this only applies to low-frequency requests, and estimating the share of requests as 1/3 of the flow is an upper estimate. But in the near future, some sites may experience an outflow of low-frequency traffic. At the same time, it is pointless to make any numerical forecasts.


In my opinion, the very idea of ​​​​building various indices made up of labeled n-grams (and this is what Yandex claims) lies on the surface. For example, one of the main features of the statoperator crawler is the construction of an n-gram index.


N-grams are more informative than individual words, are amenable to classification and allow you to significantly expand the number of factors for constructing a search by meaning. I am glad that Yandex is moving in the right direction and is implementing current methods at a high level to increase the speed and quality of search.

Opinion of Dmitry Sevalnev, head of the SEO and advertising department at “

“Korolev” is the algorithm of the Yandex search engine, on which the new version of the search is based. Launched in August 2017. It is a logical continuation of the “Palekh” algorithm. A neural network, trained on search statistics and user behavior, compares the meaning and essence of the query and web pages, which allows it to better answer complex queries.

Operating principle

The Korolev search algorithm, unlike the previously created Palekh, analyzes not only the title, but the entire page. Determining the meaning takes place simultaneously with indexing, which significantly increases the speed and number of processed pages.

Several steps are used to ensure that the user receives a response. At each of them, documents are sorted, and the best ones move on to the next stage. As the level increases, more and more difficult algorithms are used.

To speed up the final stage and increase the volume of analyzed documents, an additional index was introduced containing the approximate relevance calculated at the indexing stage for popular words and their pairs from user queries. This allowed us to significantly increase the depth - up to 200 thousand documents per request.

In addition to comparing the question asked with the meaning of the page, the algorithm takes into account what other queries users used to view a particular document, which allows for the establishment of additional semantic connections.

The algorithm uses a neural network that is trained on anonymized statistics. Ordinary users are involved in training the neural network. If previously only Yandex employees and assessors were involved in this, now anyone can take part in training the method of constructing a ranking formula machine learning Matrixnet, performing tasks in Yandex.Toloka.

“Korolev” touches on multi-word queries with clarification of the meaning, and these are, as a rule, informational, low- and micro-frequency, often specified voice search. The answer may be pages where some of the words used in the query are completely absent.

Immediately after launching many clarifying queries to the right of the search results, users were asked to evaluate the quality of the answer to the question and indicate the site that was more successful.

Impact on SEO

The Korolev search algorithm has the greatest impact on information requests with complex, verbose, and often unique wording. However, it has been noticed that sites with occurrences of some words from the query are often given higher positions.

So far, the Korolev algorithm has virtually no effect on search results for standard commercial queries. However, Yandex's increasing focus on understanding the meaning naturally suggests that this is a matter of time. That's why:

  • we need to pay more attention to the information content of the content, its value and usefulness for the user, without throwing water around;
  • the era of “nausea” text, exact occurrences of key phrases is becoming a thing of the past;
  • using the principles of LSI copywriting with topic-setting words, synonyms, etc. is more promising than traditional keyword entry and can attract additional traffic somewhere;
  • you need to pay close attention to semantic markup to help Yandex correctly understand the content of the pages;
  • it is important to maintain high performance behavioral factors(time of visit, viewing depth, etc.).

The “space” premiere of Yandex is not only a change in the structure of the index, but also a kind of another reminder that you need to create content for people, and not just for attempts to manipulate the search results.

On August 22, 2017, Yandex launched a new version of the search algorithm - “Korolev”. You can describe its essence as briefly and succinctly as possible with words from the Yandex press release:

The launch of the algorithm took place at the Moscow Planetarium and was accompanied by reports from the algorithm developers, a ceremonial pressing of the launch button, and even a call to the ISS and a live broadcast with the cosmonauts.

The full video of the presentation can be viewed right here, and below we will look at the main changes and responses to frequently asked questions. We will accompany the information with comments from Yandex employees on the company blog, as well as quotes from official sources.

What has changed in Yandex search?

“Korolev” is a continuation of the “Palekh” algorithm, introduced in November 2016. "Palekh" was the first step towards semantic search, the task of which is to better understand the meaning of pages.

“Korolyov” is now able to understand the meaning of the entire page, and not just the title, as was the case after the announcement of “Palekh”.


The algorithm should improve results for rare and complex queries.

Documents may not contain many of the query words, so traditional text relevance algorithms will not be up to the task.

It looks something like this:

Google uses a similar algorithm – RankBrain:

The scope of the Queen algorithm applies to all requests, including commercial ones. However, the impact is most noticeable on verbose queries. Yandex has confirmed that the algorithm works for all searches.

Of course, the goal of the algorithm was to improve the quality of results for rare and complex issues. Let's check it on rare and complex commercial requests related specifically to the name of the item. For example, in this case, Yandex really understands what we are talking about. True, the search results are mostly reviews and articles, not commercial sites.


And in this case, the search engine realized that I was most likely interested in a drone or quadcopter. Of course, search results start from Yandex.Market:


But in some cases Yandex is powerless...


How it works (+ 11 photos from the presentation)

Let's take a closer look at the presentation of the new algorithm. Below there will be only excerpts of the most interesting moments with our comments and slides from the presentation.

The new version of search is based on a neural network. It consists of a large number of neurons. A neuron has one output and several inputs; it can summarize the information received and, after transformation, transmit it further.


A neural network can perform much more complex tasks and can be trained to understand the meaning of text. To do this, you need to give her many training examples.

Yandex began work in this direction with the DSSM model, consisting of two parts corresponding to the request and the page. The output was an assessment of how close they were in meaning.


To train a neural network, you need many training examples.


    Negative ones are a pair of texts that are not related in meaning.

    Positive – text-query pairs that are related in meaning.

According to the presentation, Yandex used for training an array of data on user behavior in search results and considered the query and the page that users often click on in search results to be related in meaning. But as Mikhail Slivinsky later explained, user satisfaction with search results is measured not only by clicks:


As Alexander Sadovsky previously said in his Palekh presentation, the presence of a click does not mean that the document is relevant, but the absence does not mean that it is not relevant. The Yandex model predicts whether a user will stay on the site and takes into account many other user satisfaction metrics.

After training, the model represents the text as a set of 300 numbers - a semantic vector. The closer the texts are in meaning, the greater the similarity of the vector numbers.


Neural models have been used in Yandex search for a long time, but in the Korolev algorithm the influence of neural networks on ranking has been increased.

Now, when assessing semantic proximity, the algorithm looks not only at the title, but also at the text of the page.

In parallel, Yandex was working on an algorithm for comparing the meanings of queries based on neural networks. For example, if for one request search engine knows exactly the best answer, and the user entered a query very close to it, then the search results should be similar. To illustrate this approach, Yandex gives an example: “lazy cat from Mongolia” - “manul”. ()


In Palekh, neural models were used only at the very latest stages of ranking, on approximately the top 150 documents. Therefore, in the early stages of ranking, some documents were lost, but they could have been good. This is especially important for complex and low-frequency queries.

Now, instead of calculating the semantic vector during query execution, Yandex makes calculations in advance - during indexing. Korolev carries out calculations on 200 thousand documents per request, instead of 150, which were previously under Palekh. First, this method of preliminary calculation was tested at Palekh; this made it possible to save on power and match the request not only with the title, but also with the text.


The search engine takes the full text at the indexing stage, performs the necessary operations and obtains the value. As a result, for all words and popular pairs of words, an additional index is formed with a list of pages and their preliminary relevance to the query.

The Yandex team, which was involved in the design and implementation of the new search, is launching it.



Running the algorithm:


Artificial Intelligence Training

At Yandex, for many years, the task of collecting data for machine learning has been handled by assessors who evaluate the relevance of documents to the request. From 2009 to 2013, the search engine received more than 30 million such ratings.


During this time, image and video search, internal classifiers and algorithms appeared: the number of Yandex projects increased.


Since they all worked on machine learning technologies, more assessments and more assessors were required. When there were more than 1,500 assessors, Yandex launched the Toloka crowdsourcing platform, where anyone can register and complete tasks.

For example, these are the tasks found in Toloka:


Or these:


If you want to learn more about how users evaluate the relevance of answers in order to understand what parameters of the search results are evaluated, we recommend reading the instructions for the tasks or even trying to take the training.

Over the course of several years, the service attracted more than 1 million people who made more than 2 billion ratings. This allowed Yandex to make a huge leap in the scale and volume of training data. In 2017 alone, more than 500,000 people performed tasks.


Among the tasks are:

  • Assessing the relevance of documents;


  • Tasks for developing cards. This is how they check the relevance of data about organizations for the Directory database;
  • Tasks for setting up speech technologies for voice search.

The rules that Yandex wants to teach the algorithm are open to all registered users in the form of instructions for Toloka employees. For some tasks, people's subjective opinions are simply collected.

Here is an excerpt from the instructions on how Yandex determines the relevance of a document:


The quality of ratings is very important to Yandex. It can be subjective, so tasks are given to several people at once, and then a mathematical model evaluates the distribution of votes, taking into account the degree of trust in each employee and the expertise of each participant. For each “toloker”, data on the accuracy of assessments for each project is stored and compiled into a single rating.

That is why you cannot complain that the bias of the assessors ruined your site.

Thus, an additional group of factors has appeared in Yandex:

  • The meaning of the page and its relevance to the request;
  • Whether the document is a good answer to similar user queries.

What has changed in the Yandex top?

The algorithm was supposedly launched somewhat earlier than the presentation and, according to third party services(for example, https://tools.pixelplus.ru/updates/yandex), changes in the search results began in early August, but it is unknown whether this is related to the “Korolev” algorithm.




Based on these data, we can hypothesize that the decrease in the share of main pages in the top 100 and the decrease in the age of documents within the top 100 is associated with a new algorithm that helps to obtain more relevant answers.

True, there are no noticeable changes in the top 10, top 20 or top 50. Perhaps they are not there or they are insignificant. We also did not notice any significant changes in search results for promoted queries.

Textual relevance in the standard sense has not gone away. Collections and broader responses to multi-word queries contain a large number of pages with occurrences of query words in the title and text:


The freshness of the search results also matters. An example from a Yandex presentation contains a number of recent results with the entire search phrase.



Although, given the fact that the algorithm carries out calculations immediately during indexing, Korolev can theoretically influence the mixing of results by a quick bot.

Is it necessary to somehow optimize the texts for “Queens”?

Quite the contrary: the more a search engine learns to determine the meaning of the text, the fewer occurrences of keywords are required and the more meaning is required. But the principles of optimization do not change.


For example, back in 2015, Google talked about the RankBrain algorithm, which helps search better respond to multi-word queries asked in natural language. It works well, as users noted in numerous publications comparing Yandex and Google search after the announcement new version algorithm.


This was not accompanied by a large-scale presentation and did not greatly influence the work of the specialists. No one is purposefully engaged in “optimizing for RankBrain”, so in Yandex this does not globally change the work of a specialist. Yes, there is a trend to search for and include so-called LSI keys in the text, but these are clearly not just frequently repeated words on competitors’ pages. We expect the development of SEO services in this direction.

The algorithm also states that it analyzes the meaning of other queries that bring users to the page. Again, in the future, this should give the same or similar results for synonymous queries, since now the result of the analysis of the results sometimes shows that there are no intersections for synonymous queries in the results. Let's hope that the algorithm will help eliminate such inconsistencies.

But Yandex cannot yet find (or is difficult to find) documents that are close in meaning to the query, but do not contain the query words ().


Adviсe:

    Make sure the page responds to the queries it is optimized for and that users click on.

    Make sure that the page still includes the words from the search queries. We are not talking about direct occurrences, just check if the words from the queries are in any form on the page.

    Topical words can add extra relevance to a page, but they're clearly not just words that are repeated frequently on competitors' pages. We expect the development of SEO services in this direction.

    For key phrases for which a site page is searched well, check to see if the bounce rate is below the average for the site. If the site is in a high position for a query and the user finds what he needs, the site can be shown for similar meanings key phrases(if there are any).

    Search clicks indicate user satisfaction with the result. This is not new, but it's worth checking the snippets again key queries. Perhaps somewhere it will be possible to increase the click-through rate.

How to check the impact of an algorithm on your website?

For sites that do not have a pronounced seasonality, you can compare the number of low-frequency key phrases that led to the site before and after the algorithm was launched. For example, take a week in July and a week in August.


Select “Reports – Standard reports – Sources – Search queries”.

Selecting visits from Yandex:

And we filter only those requests for which there was 1 click. Additionally, it is worth excluding phrases containing the brand name.



You can also look at the presence of search phrases, words from which you do not have in the text. In general, such phrases were present among low-frequency queries before, but now they may become noticeably more numerous.

Prospects and forecast

    The search engine will be able to even better find documents that are close in meaning to the query. The presence of occurrences will become even less important.

    Personalization will be added to the current algorithm.

    In the future, good materials that answer the user’s question may receive even more traffic for micro-frequency, rare or semantically similar queries.

    For low-frequency keywords, competition may increase due to the greater relevance of non-optimized documents.

    Hypothesis. With the help of such algorithms, Yandex can better assess how semantically related pages linking to others are, and take this into account when assessing external links. If this can be a significant factor given the weak influence of links in Yandex.

    We should expect further changes related to neural networks in other Yandex services.

Question and answer

Question: since Yandex evaluates clicks, does this mean that cheating on behavioral factors will gain momentum?


Question: Is Korolev connected with Baden-Baden?


Question: how to enable new search Yandex?

Answer: on the Yandex blog and in search queries There were often questions about how to enable or install a new search. No way. The new algorithm is already working and there are no additional settings no need to do it.


Close