Search engine architecture typically includes:

Encyclopedic YouTube

    1 / 5

    ✪ Lesson 3: How a search engine works. Introduction to SEO

    ✪ Search engine from the inside

    ✪ Shodan - black Google

    ✪ The CHEBURASHKA search engine will replace Google and Yandex in Russia

    ✪ Lesson 1 - How a search engine works

    Subtitles

Story

Chronology
Year System Event
1993 W3Catalog?! Launch
Aliweb Launch
JumpStation Launch
1994 WebCrawler Launch
Infoseek Launch
Lycos Launch
1995 AltaVista Launch
Daum Base
Open Text Web Index Launch
Magellan Launch
Excite Launch
SAPO Launch
Yahoo! Launch
1996 Dogpile Launch
Inktomi Base
Rambler Base
HotBot Base
Ask Jeeves Base
1997 Northern Light Launch
Yandex Launch
1998 Google Launch
1999 AlltheWeb Launch
GenieKnows Base
Naver Launch
Teoma Base
Vivisimo Base
2000 Baidu Base
Exalead Base
2003 Info.com Launch
2004 Yahoo!  Search Final launch
A9.com Launch
Sogou Launch
2005 MSN Search Final launch
Ask.com Launch
Nygma Launch
GoodSearch Launch
SearchMe Base
2006 wikiseek Base
Quaero Base
Live Search Launch
ChaCha Launch (beta)
Guruji.com Launch (beta)
2007 wikiseek Launch
Sproose Launch
Wikia Search Launch
Blackle.com Launch
2008 DuckDuckGo Launch
Tooby Launch
Picollator Launch
Viewzi Launch
Cuil Launch
Boogami Launch
LeapFish Launch (beta)
Forestle Launch
VADLO Launch
Powerset Launch
2009 Bing Launch
KAZ.KZ Launch
Yebol Launch (beta)
Mugurdy Closing
Scout Launch
2010 Cuil Closing
Blekko Launch (beta)
Viewzi Closing
2012 WAZZUB Launch
2014 Satellite Launch (beta)

Early in the development of the Internet, Tim Berners-Lee maintained a list of web servers hosted on the CERN website. There were more and more sites, and manually maintaining such a list became more and more difficult. The NCSA website had a special section “What’s New!” (English: What's New!), where they published links to new sites.

First computer program there was a program for searching the Internet Archie(English archie - archive without the letter “c”). It was created in 1990 by Alan Emtage, Bill Heelan, and J. Peter Deutsch, computer science students at McGill University in Montreal. The program downloaded lists of all files from all available anonymous FTP servers and built a database that could be searched by file names. However, Archie's program did not index the contents of these files, since the amount of data was so small that everything could be easily found by hand.

Development and distribution network protocol Gopher, invented in 1991 by Mark McCahill at the University of Minnesota, led to the creation of two new search programs, Veronica and Jughead. Like Archie, they searched for file names and headers stored in Gopher index systems. Veronica (English) Very Easy Rodent-Oriented Net-wide Index to Computerized Archives) allowed you to search by keywords most Gopher menu titles in all Gopher lists. Jughead program Jonzy's Universal Gopher Hierarchy Excavation And Display) retrieved menu information from specific Gopher servers. Although the name of the Archie search engine was not related to the comic book series "Archie", however, Veronica and Jughead are characters from these comics.

By the summer of 1993, there was not yet a single system for searching the Internet, although numerous specialized directories were manually maintained. Oscar Nierstrasz at the University of Geneva wrote a series of Perl scripts that periodically copied these pages and rewrote them into a standard format. This became the basis for W3Catalog?!, the web's first primitive search engine, launched on September 2, 1993.

Probably the first web crawler written in Perl was the "World Wide Web Wanderer" bot by Matthew Gray in June 1993. This robot created the search index "Wandex". Wanderer's goal was to measure the size of the World Wide Web and find all web pages containing the words from the query. In 1993, the second search engine “Aliweb” appeared. Aliweb did not use a crawler, but instead expected notifications from website administrators about the presence of an index file in a certain format on their sites.

JumpStation, created in December 1993 by Jonathan Fletcher, searched for web pages and built their indexes using a crawler, and used a web form as an interface for formulating search queries. It was the first Internet search tool that combined the three most important functions of a search engine (verification, indexing, and search itself). Due to the limited computer resources of the time, indexing and therefore searching was limited to only the titles and titles of web pages found by the crawler.

Search engines participated in the “Dotcom Bubble” of the late 1990s. Several companies hit the market in spectacular fashion, generating record profits during their initial public offerings. Some have abandoned the public search engine market and started working only with the corporate sector, e.g. Northern Light.

Google adopted the idea of ​​selling keywords in 1998, then it was a small company that provided a search engine at goto.com. The move marked a shift for search engines from competing with each other to becoming one of the most profitable business ventures on the Internet. Search engines began to sell the first places in search results to individual companies.

The Google search engine has been prominent since the early 2000s. The company has achieved a high position due to good search results using the PageRank algorithm. The algorithm was introduced to the public in the article "The Anatomy of Search Engine", written by Sergey Brin and Larry Page, the founders of Google. This iterative algorithm ranks web pages based on an estimate of the number of hyperlinks to a web page, under the assumption that “good” and “important” pages have more links than others. Google interface designed in a spartan style, where there is nothing superfluous, unlike many of its competitors who built a search engine into the web portal. The Google search engine has become so popular that imitating systems have appeared, for example, Mystery Seeker(secret search engine).

Searching for information in Russian

In 1996, a search was implemented taking into account Russian morphology on the Altavista search engine and the original Russian search engines Rambler and Aport were launched. On September 23, 1997, the Yandex search engine was opened. On May 22, 2014, Rostelecom opened the national search engine Sputnik, which at the time of 2015 is in beta testing. On April 22, 2015, a new Sputnik service was launched.  Children especially for children with increased safety.

Methods of cluster analysis and metadata search have become very popular. Of the international cars of this type, the most famous is "Clusty" companies Vivisimo. In 2005, in Russia, with the support of Moscow State University, the Nigma search engine was launched, supporting automatic clustering. In 2006, the Russian metamachine Quintura opened, offering visual clustering in the form of a tag cloud. Nygma also experimented with visual clustering.

How does a search engine work?

The main components of a search system: search robot, indexer, search engine.

Typically, systems operate in stages. First, the crawler retrieves the content, then the indexer generates a searchable index, and finally, the search engine provides the functionality to search the indexed data. To update the search engine, this indexing cycle is repeated.

Search engines work by storing information about many web pages, which they retrieve from HTML pages. A search robot or “crawler” (eng. Crawler) is a program that automatically goes through all the links found on the page and highlights them. The crawler, based on links or based on a predefined list of addresses, searches for new documents not yet known to the search engine. The site owner can exclude certain pages using robots.txt, which can be used to prevent the indexing of files, pages or directories on the site.

The search engine analyzes the content of each page for further indexing. Words can be extracted from titles, page text, or special fields - meta tags. An indexer is a module that analyzes a page, having previously broken it into parts, using its own lexical and morphological algorithms. All elements of a web page are isolated and analyzed separately. Web page data is stored in an index database for use in subsequent queries. The index allows you to quickly find information based on a user's request. A number of search engines, like Google, store the entire original page or part of it, the so-called cache, as well as various information about the web page. Other systems, like AltaVista's, store every word of every page found. Using a cache helps speed up the retrieval of information from already visited pages. Cached pages always contain the text that the user specified in the search query. This can be useful in the case when the web page has been updated, that is, it no longer contains the text of the user’s request, and the page in the cache is still old. This situation is related to the loss of links. linkrot) and Google's user-friendly (usability) approach. This involves returning short text fragments from the cache containing the request text. The principle of least surprise applies; the user usually expects to see the searched words in the texts of the received pages ( User expectations). In addition to the fact that using cached pages speeds up searches, cached pages may contain information that is no longer available anywhere else.

The search engine works with the output files received from the indexer. The search engine accepts user queries, processes them using an index and returns search results.

When a user enters a query into a search engine (usually using keywords), the system checks its index and returns a list of the most relevant web pages (sorted by some criterion), usually with a short summary containing the title of the document and sometimes parts of the text. Search index is built using a special methodology based on information extracted from web pages. Since 2007, the Google search engine allows you to search based on time, creating the documents you are looking for (calling the “Search Tools” menu and specifying the time range). Most search engines support the use of Boolean operators AND, OR, NOT in queries, which allows you to refine or expand the list of searched keywords. In this case, the system will search for words or phrases exactly as entered. Some search engines have the option approximate search, in this case, users expand their search by specifying the distance to keywords. There are also conceptual search, which uses statistical analysis of the use of searched words and phrases in the texts of web pages. These systems allow queries to be written in natural language. An example of such a search engine is the site ask com.

The usefulness of a search engine depends on the relevance of the pages it finds. While millions of web pages may include a given word or phrase, some may be more relevant, popular, or authoritative than others. Most search engines use ranking methods to bring the “best” results to the top of the list. Search engines decide which pages are more relevant and in what order results should be shown in different ways. Search methods, like the Internet itself, change over time. This is how two main types of search engines emerged: systems of predefined and hierarchically ordered keywords and systems in which an inverted index is generated based on text analysis.

Most search engines are commercial enterprises that make a profit through advertising; in some search engines you can buy first places in search results for given keywords for a fee. Those search engines that do not charge money for the order in which results are returned earn money from contextual advertising, while advertising messages correspond to the user’s request. Such advertising is displayed on a page with a list of search results, and search engines earn money every time a user clicks on advertising messages.

Types of Search Engines

There are four types of search engines: robotic, human-powered, hybrid, and meta.

They consist of three parts: a crawler (“bot”, “robot” or “spider”), an index and search engine software. A crawler is needed to crawl the web and create lists of web pages. An index is a large archive of copies of web pages. Target software- evaluate search results. Due to the fact that the search robot in this mechanism constantly explores the network, the information is more relevant. Most modern search engines are systems of this type.
  • human-managed systems (resource directories)
These search engines retrieve lists of web pages. The directory contains the address, title and brief description site. The resource directory only looks for results from page descriptions submitted to it by webmasters. The advantage of catalogs is that all resources are checked manually, therefore, the quality of the content will be better compared to the results obtained automatically by the first type of system. But there is also a drawback - updating catalog data is done manually and can significantly lag behind the real state of affairs. Page rankings cannot change instantly. Examples of such systems include Yahoo directory , dmoz and Galaxy.
  • hybrid systems
Search engines such as Yahoo, Google, MSN, combine the functions of systems using search robots and systems operated by humans.
  • meta-systems
Metasearch engines combine and rank the results of several search engines at once. These search engines were useful when each search engine had a unique index and search engines were less "smart". Since search has improved so much now, the need for them has decreased. Examples: MetaCrawler and MSN Search.

Search Engine Market

Google is the most popular search engine in the world with a market share of 68.69%. Bing ranks second with a 12.26% share.

The most popular search engines in the world:

Search engine Market share in July 2014 Market share in October 2014 Market share in September 2015
Google 68,69 % 58,01 % 69,24%
Baidu 17,17 % 29,06 % 6,48%
Bing 6,22 % 8,01 % 12,26%
Yahoo! 6,74 % 4,01 % 9,19%
AOL 0,13 % 0,21 % 1,11%
Excite 0,22 % 0,00 % 0,00 %
Ask 0,13 % 0,10 % 0,24%

Asia

In East Asian countries and Russia, Google is not the most popular search engine. In China, for example, it is more popular search engine Soso?!.

In South Korea, Naver's own search portal is used by about 70% of Yahoo!  Japan and Yahoo! Taiwan is the most popular search engine in Japan and Taiwan respectively.

Russia and Russian-language search engines

According to LiveInternet data in June 2015 on the coverage of Russian-language search queries:

  • All-language:
    • Yahoo! (0.1%) and search engines owned by this company: Inktomi,AltaVista, Alltheweb
  • English-speaking and international:
    • AskJeeves(Teoma engine)
  • Russian-speaking - most “Russian-language” search engines index and search for texts in many languages ​​- Ukrainian, Belarusian, English, Tatar and others. They differ from “all-language” systems that index all documents in a row in that they mainly index resources located in domain zones where the Russian language dominates, or in other ways limit their robots to Russian-language sites.

Some of the search engines use external algorithms search.

Google search engine quantitative data

The number of Internet users and search engines and user requirements for these systems are constantly growing. To increase search speed necessary information major search engines contain a large number of servers. Servers are usually grouped into server centers (data centers). Popular search engines have server centers scattered around the world.

In October 2012, Google launched the "Where the Internet Lives" project, where users are given the opportunity to explore the company's data centers.

The Google search engine knows the following about the operation of data centers:

  • The total capacity of all Google data centers, as of 2011, was estimated at 220 MW.
  • When Google planned to open a new complex in Oregon in 2008, consisting of three buildings with a total area of ​​6.5 million square meters, Harper's Magazine estimated that such a large complex would consume more than 100 megawatts of electricity, comparable to the energy consumption of a city with a population of 300,000 Human.
  • Approximate number Google servers in 2012 - 1,000,000.
  • Google's expenses on data centers amounted to $1.9 billion in 2006, and $2.4 billion in 2007.

The size of the World Wide Web as indexed by Google as of December 2014 is approximately 4.36 billion pages.

Search engines that take into account religious prohibitions

Global spread of the Internet and increase in popularity electronic devices in the Arab and Muslim world, in particular in the countries of the Middle East and the Indian subcontinent, contributed to the development of local search engines that take into account Islamic traditions. Such search engines contain special filters that help users avoid visiting prohibited sites, such as sites with pornography, and allow them to use only those sites whose content does not contradict the Islamic faith. Just before the Muslim month of Ramadan, in July 2013, the Halalgoogling- a system that provides users with only halal "correct" links, filtering search results received from other search engines such as Google and Bing. Two years earlier, in September 2011, the I'mHalal search engine was launched to serve users in the Middle East. However, this search service had to be closed soon, according to the owner, due to lack of funding.

Lack of investment and the slow pace of technology diffusion in the Muslim world have hampered progress and hampered the success of a serious Islamic search engine. The failure of huge investments in Muslim lifestyle web projects, one of which was Muxlim. He's raised millions of dollars from investors like Rite Internet Ventures, and now - according to I'mHalal's last post before it shut down - is pitching the dubious idea that "the next Facebook or Google might only come from the Middle East." if you support our brilliant youth." However, Islamic Internet experts have been in the business for many years of determining what is or is not compliant with Shariah, and classifying websites as "halal" or "haram". All past and present Islamic search engines are simply a specially indexed set of data, or they are major search engines such as Google, Yahoo and Bing, with some filtering system used to prevent users from accessing haram sites such like sites about nudity, LGBT, gambling and any other topics that are considered anti-Islamic.

Among other religiously-oriented search engines, Jewogle - Jewish Google version and SeekFind.org, a Christian site that includes filters to protect users from content that may undermine or weaken their faith.

Personal results and filter bubbles

Many search engines, such as Google and Bing, use algorithms to selectively guess what information a user would like to see based on their past browsing activity. As a result, websites only show information that is consistent with the user's past interests. This effect is called the "filter bubble".

All this leads to the fact that users receive much less information that contradicts their point of view and become intellectually isolated in their own “information bubble”. Thus, the "bubble effect" can have negative consequences for the formation of civic opinion.

Search Engine Bias

Although search engines are programmed to rank websites based on some combination of popularity and relevance, in reality experimental research indicates that various political, economic and social factors influence search results.

Such bias may be a direct result of economic and commercial processes: companies that advertise on a search engine may become more popular in organic search results on the engine. Removing search results that do not comply with local laws is an example of the influence of political processes. For example, Google will not display some neo-Nazi websites in France and Germany, where Holocaust denial is illegal.

Bias can also be a consequence of social processes, as search engine algorithms are often designed to exclude unformatted viewpoints in favor of more “popular” results. The indexing algorithms of the major search engines give priority to American sites.

Search bombing is one example of an attempt to manipulate search results for political, social or commercial reasons.

See also

  • Qwika
  • Electronic library#Lists of libraries and search engines
  • Web Developer Toolbar

Notes

Literature

  • Ashmanov I. S., Ivanov A. A. Website promotion in search engines. - M.: Williams, 2007. - 304 p. - ISBN 978-5-8459-1155-1.
  • Baykov V.D. Internet. Search for information. Website promotion. - St. Petersburg. : BHV-Petersburg, 2000. - 288 p. - ISBN 5-8206-0095-9.
  • Kolisnichenko D. N. Search engines and website promotion on the Internet. - M.: Dialectics, 2007. - 272 p. - ISBN 978-5-8459-1269-5.
  • Lande D.V. Search for knowledge on the Internet. - M.: Dialectics, 2005. - 272 p. - ISBN 5-8459-0764-0.
  • Lande D. V., Snarsky A. A., Bezsudnov I. V. Internet: Navigation in complex networks: models and algorithms. - M.: Librocom (Editorial URSS), 2009. - 264 p. - ISBN 978-5-397-00497-8.
  • Chu H., Rosenthal M.

It is generally accepted that the history of the first search engines in the Russian segment of the Internet begins in 1995. It was this year that a morphological extension to the Altavista search engine became available to Runet users. Almost immediately after the expansion, the original search engines Aport and Rambler appeared, which are considered the first Russian search engines.

AltaVista was released in December 1995 and was supported by the most powerful computing server available at the time, DEC Alpha. It was the most quick search ovik, which could process millions of search queries per day.

Aport

The Aport search system was demonstrated to the general public several months before Rambler in February 1996. At the time of its launch, the machine searched only on the site russia.agama.com. Subsequently, the developers of Aport demonstrated extreme sluggishness in the development of their project, setting up a search for a very long time, first on 4 servers, then on 6. Aport learned to index the entire Runet only in November 1997, at which time it was officially launched. By this time, another search engine called Rambler was already successfully operating in the Russian-language segment.

Despite all these circumstances, Aportu until the early 2000s. managed to successfully compete with the main market players Rambler and Yandex, and be included in the list of search leaders in Runet. Subsequently, the company that created this search system was bought out by a telecommunications holding, all development was stopped, and Aport quickly lost its position, losing ground to its main competitors.

At the moment, Aport is an electronic trading platform, with a large database of firms and companies offering more than 8 million products in 1,400 categories.

Rambler

The team at the telecommunications company Stack decided to create an original Russian search engine back in 1994. By that time, Stack already had some experience in working with the Internet, servers and websites. Working with the Russian segment of the Internet, the company’s specialists determined that foreign search engines practically do not perceive the Cyrillic alphabet and pages with multiple encodings, and index Runet sites very poorly.

Rambler" translated from English means "wanderer", "tramp", "loiter".

The core of the new search engine was written by programmer Dmitry Kryukov in just a few months. Work on the new machine was financed by the Stack company, whose creator Sergei Lysakov actively helped Kryukov in his very difficult work. The name Rambler and the logo of the future search engine, also Dmitry. The domain rambler.ru was registered on September 26, 1996, and on October 8, a search engine called Rambler was posted by its creator on the network. At that time, the new search engine had indexed 100 thousand documents, which was a thoughtful and strategically important step that allowed Rambler to become the undisputed search leader in RuNet for several years.

As expected, Google took first place in the world rankings. His share is more than 70% of search queries from residents from all over the world. Moreover, a third of all google.com traffic comes from US citizens. In addition, Google is the most visited website in the world. The average daily duration of use of the Google search engine is 9 minutes.

The advantage of the Google search engine is the absence of unnecessary elements on the page. Just a search bar and the company logo. Chip are animated pictures and browser games dedicated to popular and local holidays.

2. Bing

Bing - search engine from Microsoft, dating back to 2009. From that moment on, it became a mandatory attribute of smartphones running Windows OS. Bing is also distinguished by minimalism - in addition to the header with a list of all Microsoft products, the page contains only a search bar and the name of the system. Bing is most popular in the USA (31%), China (18%) and Germany (6%).

3. Yahoo!

Third place went to one of the oldest search engines - Yahoo. The bulk of users also live in the USA (24%). It seems that the rest of the world is deliberately avoiding the help of search robots... The search engine is also popular in India, Indonesia, Taiwan and the UK. In addition to the search bar, on the Yahoo! offers a weather forecast in your region, as well as global trends in the form of a news feed.

4. Baidu

A Chinese search engine that has gained notoriety in Russia. Due to aggressive policies and lack of translation into Russian or English languages, extensions of this search engine are perceived as viruses. It is very difficult to completely remove them and get rid of pop-up windows with hieroglyphs. However, this site is fourth in the world by attendance. 92% of its audience are Chinese citizens.

5. AOL

AOL is an American search engine whose name stands for America Online. Its popularity is significantly lower than that of previous systems. Its heyday was in the 90s and 00s. Almost 70% of AOL's audience are residents of the United States.

6.Ask.com

This search engine, dating back to 1995, has quite unusual interface. She perceives all requests as questions and offers answer options in accordance with search results. This is somewhat reminiscent of the Answers.Mail service. However, it is not amateur answers that are included in the search results, but full-fledged articles. Over the past year, the site has lost about 50 positions in the world ranking of the most popular Internet resources and today ranks only 104th.

7.Excite

This search engine is unremarkable and similar to a lot of other sites. It offers users a lot of services (such as News, Mail, Weather, Travel, etc.) The site's interface also evokes memories of the web of the 90s and, one might assume, has changed little since then.

8.DuckDuckGo

The developers immediately warn that this search engine does not track your actions online. Nowadays, this is a significant argument when choosing a search engine. The site design is made in a modern manner, using bright colors and funny pictures. Unlike other search engines, the “duck search engine” has been translated into Russian. Over the past year, the site has gained about 400 positions and in March 2017. is ranked 504th in the Alexa popularity rankings.

9. Wolfram Alpha

A distinctive feature of this search is the variety of auxiliary services designed for queries related to certain knowledge. That is, in the search results you will not see links to posts on social networks or articles from the yellow press. You will be offered specific numbers and verified facts in the form of a single document. This browser is ideal for schoolchildren and students.

10. Yandex

Search engine, the most popular in Russia and the CIS countries. In addition, about 3% of the site's audience are residents of Germany. The site is notable for its large number of services for all occasions (music, radio, schedule public transport, real estate, translator, etc.) The resource also offers a large selection of individual website design, as well as customizing widgets for yourself. Yandex ranks 31st in the world in popularity, having lost 11 positions over the past year.


In the early 90s, Internet users were not in the habit of asking search engines questions. Links to useful sites that we learned about mainly from friends were collected into separate text files. Later, directory sites appeared with categories that were replenished manually. Such, for example, were the Yahoo! and Virtual Library (VLib), which was maintained and stored on the CERN server by the inventor modern Internet Tim Berners-Lee.

The first search engine in history is considered to be Archie, which appeared in 1990, a file archive with downloadable site directories and the ability to search through them, created by students at McGill University in Montreal. Archie did not index the content of sites: search engines launched in 1993, including World Wide Web Wanderer, ALIWEB and JumpStation, learned this. The latter became the first full-fledged search engine in the modern sense: with the help of robots, it collected and ranked links in search results based on similarity to the user’s query.

Creators of the first search engines
for the most part or quit this job, or went to work
to large Internet companies

Introduced in 1994, AltaVista became the first search engine to work with natural language queries, and the first truly powerful search engine was WebCrawler, which indexed the entire content of pages. Finally, in 1997–1998, Google and Yandex, the most popular search engines in Russia today, launched. Thanks to better algorithms, they became an international and regional leader, respectively, but it took them time to take share from other market participants. The creators of the first search engines, for the most part, either quit this occupation or went to work for large Internet companies that bought their entire systems.

In the early days of the Internet, users were a privileged minority and the amount of information available was relatively small. At that time, access to it was mainly available to employees of various large educational institutions and laboratories, and the data obtained were used for scientific purposes. At that time, the use of the Internet was not as relevant as it is now.

In 1990 British scientist Tim Berners-Lee (who is also the inventor of URI, URL, HTTP, World Wide Web) created the site info.cern.ch, which is the world's first accessible directory of Internet sites. From that moment on, the Internet began to gain popularity not only among scientific circles, but also among ordinary owners of personal computers.

Thus, the first way to facilitate access to information resources on the Internet was the creation of website directories. Links to resources in them were grouped by topic.

The first project of this kind is considered to be Yahoo, open in April 1994. Due to the rapid growth in the number of sites in it, it soon became possible to search for the necessary information upon request. Of course, it was not yet a full-fledged search engine. The search was limited to only the data that was in the catalog.

In the early stages of the development of the Internet, link directories were used very actively, but gradually lost their popularity. The reason is simple: even if there are many resources in modern catalogs, they still show only a small part of the information available on the Internet. For example, the largest directory on the network is - DMOZ(Open Directory Project). It contains information on just over five million resources, which is dwarfed by Google's search database of over eight billion documents.

The largest Russian-language catalog is the Yandex catalog. It contains information about just over one hundred and four thousand resources.

Timeline of search engine development

1945- American engineer Vannevar Bush published a record of the idea that later led to the invention of hypertext, and a discussion of the need to develop a system for quickly retrieving data from information stored in this way (the equivalent of today's search engines). The concept of a memory expander device he introduced contained original ideas that eventually came to fruition on the Internet.

1960s— Gerard Salton and his team at Cornell University developed the SMART information retrieval system. SMART is an abbreviation for Salton’s Magic Automatic Retriever of Text, that is, “Salton’s Magic Automatic Text Retriever.” Gerard Salton is considered the father of modern search technology.

1987-1989 – developed Archie— a search engine for indexing FTP archives. Archie was a script that automated the implementation of listings on ftp servers, which were then transferred to local files, and only then a quick search for the necessary information was carried out in the local files. The search was based on a standard Unix grep command, and user access to the data was based on telnet.

In the next version, the data was divided into separate databases, one of which contained only text file names; and the other - records with links to hierarchical directories of thousands of hosts; and another one connecting the first two. This version of Archie was more efficient than the previous one, since the search was carried out only by file names, eliminating many pre-existing duplicates.

The search engine became more and more popular, and the developers began to think about how to speed up its work. The database mentioned above has been replaced by another one based on compressed tree theory. New version, essentially created a full-text database instead of a list of file names, and was significantly faster than before. In addition, minor changes allowed the Archie system to index web pages. Unfortunately, for various reasons, work on Archie soon ceased.

In 1993 The world's first search engine was created World Wide Web Wandex. It was based on the World Wide Web Wanderer bot developed by Matthew Gray from the Massachusetts Institute of Technology.

1993– Martin Koster creates Aliweb- one of the first search engines on the World Wide Web. Site owners had to add them to the Aliweb index themselves in order for them to appear in searches. Because too few webmasters did it, Aliweb did not become popular

April 20, 1994– Brian Pinkerton from the University of Washington released WebCrawler— the first bot that indexed pages completely. The main difference between the search engine and its predecessors is that it allows users to search for any keyword on any web page. Today this technology is the search standard of any search engine. The WebCrawler search engine became the first system that was known to a wide range of users. Unfortunately, the throughput was low and the system was often unavailable during the daytime.

July 20, 1994– opened Lycos is a major development in search technology created at Carnegie Melon University. Michael Maldin was responsible for this search engine and still remains a leading specialist at Lycos Inc. Lycos opened with a catalog of 54,000 documents. And in addition to this, the results it provided were ranked, and it also took prefixes and approximate matches into account. But the main difference between Lycos was its constantly expanding catalog: by November 1996, 60 million documents were indexed - more than any other search engine at the time.

January 1994- was founded Infoseek. It was not truly innovative, but it had a number of useful additions. One of these popular additions was the ability to add your page in real time.

1995- started AltaVista. Having appeared, the AltaVista search system quickly gained recognition from users and became a leader among its peers. The system had virtually unlimited bandwidth at that time, it was the first search engine in which it was possible to formulate queries in natural language, as well as formulate complex queries. Users were allowed to add or remove their own URLs within 24 hours. AltaVista also offered many search tips and tricks. The main merit of the AltaVista system is considered to be support for many languages, including Chinese, Japanese and Korean. Indeed, in 1997, not a single search engine on the Internet worked with several languages, especially rare ones.

1996— the AltaVista search engine has launched a morphological extension for the Russian language. In the same year, the first domestic search engines were launched - Rambler.ru and Aport.ru. The emergence of the first domestic search engines marked a new stage in the development of Runet, allowing Russian-speaking users to query native language, as well as quickly respond to changes occurring within the Network.

May 20, 1996— the Inktomi corporation appeared along with its search engine Hotbot. Its creators were two teams from the University of California. When the site appeared, it quickly became popular. In October 2001, Danny Sullivan wrote an article entitled "Inktomi's Spam Sites Database Is Open to the Public," which described how Inktomi accidentally made its database of spam sites, which by then contained about 1 million URLs, available to the public. general use.

1997– in Western countries there comes a turning point in the development of search engines, when S. Brin and L. Page from Stanford University founded Google(the original name of the BackRub project). They developed their own search engine, which gave users the opportunity to carry out high-quality searches, taking into account morphology, errors in spelling words, and also increase relevance in query results.

September 23, 1997– announced Yandex, which quickly became the most popular search engine among Russian-speaking Internet users. With the launch of the Yandex search engine, domestic search engines began to compete with each other, improving the system of searching and indexing sites, delivering results, as well as offering new services.

Thus, the development of search engines and their formation can be characterized by the stages listed above.

Today, three leaders have settled in the global market - Google, Yahoo and Bing. They have their own databases and their own search algorithms. Many other search engines use the results of these three major search engines. For example, AOL uses a database Google data while AltaVista, Lycos and AllTheWeb use the Yahoo database. All other search engines in various combinations use the results (issues) of the listed systems.

If we conduct a similar analysis of search engines popular in the CIS countries, we will see that mail.ru broadcasts Google search, while applying its new developments, Rambler, in turn, broadcasts Yandex. Therefore, the entire RuNet market can be divided between these two giants.

That is why, in the CIS countries, website promotion, as a rule, is carried out only in these two PS.


Close