As a rule, search engine is a site that specializes in finding information that meets the user's request criteria. The main task of such sites is to organize and structure information on the network.

Most people, using the services of a search engine, never ask themselves how exactly the machine operates, searching for the necessary information from the depths of the Internet.

For the average network user, the very concept of how search engines operate is not critical, since the algorithms that guide the system are capable of satisfying the needs of a person who does not know how to create an optimized query when searching for the necessary information. But for a web developer and specialists involved in website optimization, it is simply necessary to have at least a basic understanding of the structure and principles of operation of search engines.

Each search engine works according to precise algorithms, which are kept in the strictest confidence and are known only to a small circle of employees. But when designing a website or optimizing it, it is imperative to take into account the general rules for the functioning of search engines, which are discussed in this article.

Despite the fact that each PS has its own structure, after careful study they can be combined into basic, generalizing components:

Indexing module

Indexing Module - This element includes three additional components(robot programs):

1. Spider (spider robot) - downloads pages, filters the text stream, extracting all internal hyperlinks from it. In addition, Spider stores the download date and server response header, as well as the URL of the page.

2. Crawler (crawling spider robot) - analyzes all links on the page, and based on this analysis, determines which page to visit and which one should not be visited. In the same way, the crawler finds new resources that must be processed by the server.

3. Indexer (Robot indexer) – analyzes internet pages downloaded by the spider. In this case, the page itself is divided into blocks and analyzed by the indexer using morphological and lexical algorithms. The indexer parses various parts of the web page: headers, texts and other service information.

All documents processed by this module are stored in the search engine database, called the system index. In addition to the documents themselves, the database contains the necessary service data - the result of careful processing of these documents, guided by which the search engine fulfills user requests.

Search server

The next, very important component of the system is the search server, whose task is to process the user’s request and generate a search results page.

When processing a user's request, the search server calculates the relevance rating of the selected documents to the user's request. The position that a web page will take when issuing search results depends on this rating. Each document that satisfies the search criteria is displayed on the results page as a snippet.

Snippet is brief description page, including title, link, keywords and summary text information. Using the snippet, the user can evaluate the relevance of the pages selected by the search engine to their query.

The most important criterion that the search server uses when ranking query results is the already familiar TCI indicator ().

All the described PS components are costly and very resource-intensive. The performance of a search engine directly depends on the effectiveness of the interaction of these components.

Did you like the article? Subscribe to blog news or share on social networks, and I will answer you


6 comments to the post “Search engines are their robots and spiders”

    I've been looking for this information for a long time, thank you.

    Answer

    I'm glad that your blog is constantly evolving. Such posts only increase popularity.

    Answer

    I understood something. Question, does PR somehow depend on the TIC?

    Hello Friends! Today you will learn how Yandex and Google search robots work and what function they perform in website promotion. So let's go!

    Search engines do this action in order to find ten WEB projects out of a million sites that have a high-quality and relevant answer to the user’s request. Why only ten? Because it consists of only ten positions.

    Search robots are friends to both webmasters and users

    Why it is important for search robots to visit a site has already become clear, but why does the user need this? That’s right, in order for the user to see only those sites that will respond to his request in full.

    Search robot- a very flexible tool, it is able to find a site, even one that has just been created, and the owner of this site has not yet worked on it. That’s why this bot was called a spider; it can stretch its legs and get anywhere on the virtual web.

    Is it possible to control a search robot to your advantage?

    There are cases when some pages are not included in the search. This is mainly due to the fact that this page has not yet been indexed by a search robot. Of course, sooner or later a search robot will notice this page. But it takes time, and sometimes quite a lot of time. But here you can help the search robot visit this page faster.

    To do this, you can place your website in special directories or lists, social networks. In general, on all sites where the search robot simply lives. For example, social networks update every second. Try to advertise your site, and the search robot will come to your site much faster.

    One main rule follows from this. If you want search engine bots to visit your site, you need to feed them new content on a regular basis. If they notice that the content is being updated and the site is developing, they will begin to visit your Internet project much more often.

    Every search robot can remember how often your content changes. He evaluates not only quality, but time intervals. And if the material on the site is updated once a month, then he will come to the site once a month.

    Thus, if the site is updated once a week, then the search robot will come once a week. If you update the site every day, then the search robot will visit the site every day or every other day. There are sites that are indexed within a few minutes after updating. This social media, news aggregators, and sites that post several articles a day.

    How to give a task to a robot and prohibit it from doing anything?

    Early on, we learned that search engines have multiple robots that perform different tasks. Some are looking for pictures, some for links, and so on.

    You can control any robot using a special file robots.txt . It is from this file that the robot begins to get acquainted with the site. In this file you can specify whether the robot can index the site, and if so, which sections. All these instructions can be created for one or all robots.

    Website promotion training

    More details about wisdom SEO promotion sites in search engines Google systems and Yandex, I talk on my own on Skype. I brought all my WEB projects to more traffic and get excellent results from this. I can teach this to you too, if you are interested!

    Search robot called special program any search engine that is designed to enter into the database (indexing) sites and their pages found on the Internet. Names also used: crawler, spider, bot, automaticindexer, ant, webcrawler, bot, webscutter, webrobots, webspider.

    Operating principle

    A search robot is a browser-type program. It constantly scans the network: visits indexed (already known to it) sites, follows links from them and finds new resources. When a new resource is discovered, the procedure robot adds it to the search engine index. The search robot also indexes updates on sites, the frequency of which is fixed. For example, a site that is updated once a week will be visited by a spider with this frequency, and content on news sites can be indexed within minutes of publication. If no links from other resources lead to the site, then to attract search robots the resource must be added through a special form (Center Google webmasters, Yandex webmaster panel, etc.).

    Types of search robots

    Yandex spiders:

    • Yandex/1.01.001 I - the main bot involved in indexing,
    • Yandex/1.01.001 (P) - indexes pictures,
    • Yandex/1.01.001 (H) - finds mirror sites,
    • Yandex/1.03.003 (D) - determines whether the page added from the webmaster panel meets the indexing parameters,
    • YaDirectBot/1.0 (I) - indexes resources from the Yandex advertising network,
    • Yandex/1.02.000 (F) - indexes site favicons.

    Google Spiders:

    • Googlebot is the main robot
    • Googlebot News - scans and indexes news,
    • Google Mobile - indexes sites for mobile devices,
    • Googlebot Images - searches and indexes images,
    • Googlebot Video - indexes videos,
    • Google AdsBot - checks the quality of the landing page,
    • Google Mobile AdSense and Google AdSense - indexes sites of the Google advertising network.

    Other search engines also use several types of robots, functionally similar to those listed.

    His job is to carefully analyze the content of pages of sites presented on the Internet and send the analysis results to the search engine.

    The search robot crawls new pages for a while, but later they are indexed and, in the absence of any sanctions from search engines, can be displayed in search results.

    Operating principle

    The action of search robots is based on the same principle as the work of an ordinary browser. When visiting a particular site, they bypass some of its pages or all pages without exception. They send the received information about the site to the search index. This information appears in search results corresponding to a particular request.

    Due to the fact that search robots can only visit part of the pages, problems may arise with indexing large sites. The same exact problems can arise due to poor quality.

    Interruptions in its operation make some pages unavailable for analysis. A properly compiled and properly configured robots.txt file plays an important role in the evaluation of a site by search robots.

    The depth of resource scanning and the frequency of crawling sites by search robots depends on:

    • Algorithms for search engines.
    • Website update frequency.
    • Site structures.

    Search index

    The database of information that search robots collect is called search index. This database is used by search engines to generate search results for specific .

    Not only information about sites is entered into the index: search robots are able to recognize images, multimedia files and documents in various electronic formats (.docx, .pdf, etc.).

    One of the most active search robots in the Yandex system is Bystrobot. It constantly scans news resources and other frequently updated sites. , which is not noticed by the speedbot, has no meaning.

    You can attract it with the help of special tools, and they are effective for sites for a wide variety of purposes. There are separate robots for checking sites for accessibility, for analyzing their individual characteristics, and for indexing pictures and documents in search engines.

    Friends, I welcome you again! Now we will look at what search robots are and talk in detail about search engines. google robot and how to be friends with them.

    First you need to understand what search robots actually are; they are also called spiders. What work do search engine spiders do?

    These are programs that check sites. They look through all the posts and pages on your blog, collect information, which they then transmit to the database of the search engine for which they work.

    You don’t need to know the entire list of search robots, the most important thing is to know that Google now has two main spiders, called “panda” and “penguin”. They fight against low-quality content and junk links, and you need to know how to repel their attacks.

    The Google Panda search robot was created to promote only high-quality material in searches. All sites with low-quality content are lowered in search results.

    This spider first appeared in 2011. Before its appearance, it was possible to promote any website by publishing a large amount of text in articles and using a huge amount of keywords. Together, these two techniques brought non-quality content to the top of search results, and good sites were lowered in search results.

    “Panda” immediately put things in order by checking all the sites and putting everyone in their rightful places. Although it struggles with low-quality content, it is now possible to promote even small sites with high-quality articles. Although previously it was useless to promote such sites, they could not compete with giants that have a large amount of content.

    Now we will figure out how you can avoid the “panda” sanctions. You must first understand what she doesn’t like. I already wrote above that she struggles with bad content, but what kind of text is bad for her, let’s figure it out so we don’t publish it on our website.

    The Google search robot strives to ensure that this search engine provides only high-quality materials for job seekers. If you have articles that contain little information and are not attractive in appearance, then urgently rewrite these texts so that the “panda” does not get to you.

    High-quality content can be both large and small, but if the spider sees a long article with a lot of information, then it will be more useful to the reader.

    Then you need to note duplication, in other words, plagiarism. If you think that you will rewrite other people’s articles on your blog, then you can immediately put an end to your site. Copying is strictly punished by applying a filter, and Plagiarism is checked very easy, I wrote an article on the topic how to check texts for uniqueness.

    The next thing to notice is the oversaturation of the text with keywords. Anyone who thinks that they will write an article using only keywords and take first place in the search results is very much mistaken. I have an article on how to check pages for relevance, be sure to read it.

    And another thing that can attract a “panda” to you is old articles that are morally outdated and do not bring traffic to the site. They definitely need to be updated.

    There is also a Google search robot “penguin”. This spider fights spam and junk links on your site. It also calculates purchased links from other resources. Therefore, in order not to be afraid of this search robot, you should not buy links, but publish high-quality content so that people link to you themselves.

    Now let’s formulate what needs to be done to make the site look perfect through the eyes of a search robot:

    • To make quality content, first research the topic well before writing the article. Then you need to understand that people are really interested in this topic.
    • Use specific examples and pictures, this will make the article lively and interesting. Break the text into small paragraphs to make it easy to read. For example, if you open a page of jokes in a newspaper, which ones will you read first? Naturally, each person first reads short texts, then longer ones, and lastly, long foot wraps.
    • The “panda’s” favorite quibble is the lack of relevance of an article that contains outdated information. Follow the updates and change the texts.
    • Keep track of the keyword density; I wrote above how to determine this density; in the service I described, you will receive the exact required number of keywords.
    • Don’t plagiarize, everyone knows that you can’t steal other people’s things or text – it’s the same thing. You will be punished for theft by getting caught in the filter.
    • Write texts of at least two thousand words, then such an article will look informative through the eyes of search engine robots.
    • Stay on topic with your blog. If you are running a blog about making money on the Internet, then you do not need to publish articles about air guns. This may lower the rating of your resource.
    • Design your articles beautifully, divide them into paragraphs and add pictures so that you enjoy reading and don’t want to leave the site quickly.
    • When purchasing links, make them to the most interesting and useful articles that people will actually read.

    Well, now you know what kind of work search engine robots do and you can be friends with them. And most importantly, the Google search robot and “panda” and “penguin” have been studied in detail by you.


Close