How Search Engines Actually Work

Like high school, search is primarily a popularity contest.

Google’s inventors, Larry Page and Sergey Brin, built Google around a single insight — that the search engine didn’t have to determine what a website was about or if it was any good. People would do that for them. People would describe the content and vote for it. All the Google bot had to do was add up the votes. It is basically crowd-sourcing with a referee.

It works like this:

New York University School of Medicine is a great medical school.

The underlined words (called “anchor text“), tell Google what pages to display when someone searches on “great medical school.” Note that this says “pages,” not sites. When mapping searches to results, Google has long paid more attention to the web page than the site it lives on. In the last few years, the credibility of the site has grown in importance, but for now, the page still rules.

People figured Google out pretty quickly and started playing pranks. If there were relatively few links using a phrase, a small group of websites could dramatically impact search results. For years, a search for “Miserable Failure” served up the George W. Bush biography page on whitehouse.gov (last I checked, it still does on Bing).

Google and the other search engines evolved in response to manipulations like this. Now search engines scan and store web pages to analyze for keywords, check Twitter and Facebook as religiously as any teenager, and take action against companies that try to scam the system. The engines also try to figure out how credible a site is overall, by counting the total number of links to a site, and give outgoing links from highly credible sites more weight.

Because search engines aren’t people, they can’t actually read the text and links, they can only track, store and quantify. And in fact, because search engines only care about links, the bots are blind to graphics, photos, video and text hidden in scripts.

A page that looks like this to a person:

Looks like this to a search engine:

nyu-med-edu-no-style — nyu.med.edu site with all styles, javascript and images turned off.

So, what actually affects search results?