Google

From LTCWiki

Jump to: navigation, search

Google: The hidden ideology of search

Session Overview:
Search, like any technology, is not neutral. An understanding of how search results reach our desktops (and how a search engine may have interfered with, adjusted, or filtered the search results) is vital. As search tools make decisions for us based on our search history, location, and previous online activities, the type of information we encounter is driven by assumptions made on our behalf - assumptions we are often not unaware of.
This session will explore key considerations in how search engines rank, filter, rate, adjust, and display information. The challenges and risks of being unaware of the "decisions made on our behalf" will be explored. Strategies and skills will be presented to minimize search engines "pollution of intention."


Contents

Introduction

Google is one of the most recognized brands in the world. From humble origins as a research project at Stanford, the company now exists as a model of the disruptive nature of innovation. Every day, millions of queries are handled by Google. Educators, journalists, students, government officials, parents, and children turn to Google for their information needs. And yet, surprisingly few are well versed in how to use Google beyond typing a question into the search box.

Of greater concern, however, is the impact of how Google works - how and why search results appear as they do. Why does one resource appear on page one and another on page ten? Did you know that search results are different when you search in Canada vs. searching in the US? Did you know that Google can capture your entire search history and make that searchable as well? Did you know that Google "reads" your email and inserts text ads based on the content it finds?

Google's stated goal is to "organize the world's information and make it universally accessible and useful". And in pursuit of this goal, it is entering new markets almost monthly - scanning books, Google Earth, Maps, mobile computing and gathering personal data. As the CEO of Google states: "The goal is to enable Google users to be able to ask the question such as ‘What shall I do tomorrow?’ and ‘What job shall I take?’ "

Given the the prominence Google has in how much of society interacts with information, it's important for educators in particular to gain a better understanding of how Google works.

This resource will explore the history of search, how Google calculates its search results, how to use Google and minimize assumptions made on your behalf, and potential future directions of online search.

Background and History of Search

Web search, in comparison to its short life span, has an enormous presence in the lives of people who spend time online. Initial web search engines - Alta Vista, Excite, Inktomi, LookSmart, Lycos, Alltheweb - are now largely unkown. A few engines were consolidated and then eventually acquired by larger companies (Yahoo, for example, purchased Overture who had earlier purchased Alltheweb). Search Engine History provides a detailed overview of initial search companies - predating the Web itself.

A key downfall for many search engines arose as they sought to move beyond search into more general information provision such as shopping services. The result: portals. Portals resulted in a dilution of the quality of search and often provided confusing and complex interfaces. The intent of a service like Yahoo, for example, was to keep individuals on site as long as possible. In contrast, Google kept the search interface clean and emphasized "pushing people" to resources they might find to be of value.

The Rise of Google

Google, as of November 2006, controls over 65% of the search market (dominance of this nature is reminiscent of Microsoft's desktop control). Its market capitalization now exceeds $200 billion, placing it in the top five of US companies. The growth of Google exceeds that of even the most successful "fast risers" of the modern economic era. Where and how did it all begin?

In 1995, Sergey Brin and Larry Page met as graduate students at Stanford. Their initial foray into the world of search was certainly not the stuff of grandiose plans of wealth and fame. Instead, their work arose from solving a complex, universal problem: how to find relevant data in the deluge of the internet. (See Google History for a more detailed exploration).

By early 1998, without advertising, the effectiveness of their approach to search resulted in continued growth. While many of the web search engines of the day provided an index or directory of information, search results were often not relevant. The search return time was often quite long as the engine searched huge databases. Google changed the experience of searchers by relying on their unique algorithm (based on Garfield's Journal Impact Factor). Search results were relevant, fast, and "clean" (many search companies had abandoned core search in favor of a portal approach).

While Google was growing in popularity, Page and Brin were still focused on their studies. After several unsuccessful attempts to sell their technology and have other companies license their search tool, they were faced with a choice: get serious about Google or stick with their studies. They opted for the former, and with the help of an initial $100,000 in angel funding, they took the former option. The rest is now a living history familiar to most.

Google's focus on cheap computing, tied together with creative software in a distributed manner (see this video on Google infrastructure). Google's distributed architecture has more recently been defined as "cloud computing". This approach permits software and hardware failure on an individual level without impacting overall capacity (redundancy permits replication and offloads search and activity to a network instead of heavy reliance on a single machine). Of perhaps greatest value, is the ability to use this computing cloud to navigate Google's huge index (in excess of 4 billion pages, hundreds of millions of images) and provide timely search results - even as the index grows and numbers of searchers increases.

Google's Services

Google has moved well beyond search over the last decade. While the bulk of its revenue comes from search advertising, Google now offers a wide range of services. Email, blogs, video sharing, Scholar (vertical search in academic documents), docs & spreadsheets, traditional media labs, news, Google Earth, and so on. Recent forays into Android and Open Social indicate Google's ongoing emphasis as a mediator of data and information access.

Google Labs lists numerous ongoing projects and innovations.

Acquisitions lists companies and technologies Google has acquired over the last several years.

Google, Yahoo, and Microsoft Acquisitions lists acquisitions by "the big three".

How Does Google Search Work?

The heart of Google's initial success was found in the concept of PageRank. Pagerank formed the basis of “backrub” – the original name of Google. Most web search engines largely ignored links. While searching Alta Vista, Larry Page became intrigued with the insight that could be gleaned from connections and links between websites. Up to this point, search engines were largely concerned with indexing websites and comparing user queries with extensive databases and delivering “best match” results. Page – to the amusement of his PhD advisors – sought to download the entire web and analyze links. Pagerank (a play on Larry Page’s name) is based on a principle familiar to most academics: the value (or impact) of a paper is determined by subsequent citations. The more frequently a paper is cited, the greater its impact on the field. But Page took the concept one step further when he recognized that not all links are created equal. A link from a site like Yahoo carried more “weight” than a link from a personal website. The value of a web page is determined by a combination of the quantity of incoming links and the value of each of those incoming links. For Google, search was not only confined to matching queries to an existing index, but to also determine potential value of a resource by analyzing links. This was an important advancement in the effectiveness and relevance of search; anyone who was using Excite, Yahoo, or Alta Vista, immediately found better results with Google. However, in order for pagerank to work, Google made (and continues to make) many decisions of value.

As mentioned above, Google uses a cloud computing model of many "commodity machines" tied together with software. When a searcher enters a query, the Google magic begins: a .25s search can involve more than 1000 machines. In addition to providing search results, additional servers provide ads and perform a spell check. The Google index is split into pieces called shards. Multiple replicas are made of each shard. A query is delivered to multiple shards and each shard evaluates and provides best scoring documents. The server merges these results, sorts them, and provides the best returns across multiple shards (see How Google Works video).

Choices Google Makes For Users

Google has been making decisions on behalf of web searchers since it first started providing results. The initial focus was on making decisions based on relevance and value as indicated by links. Google is highly secretive of its algorithms and changes to results are often not noticed by users. SEOs can find, for example, that a resource once returned in the top 10 suddenly fades from the index. Organizations who rely heavily on Google search for selling products and services can find the value of Google has a corresponding dark side when their business disappears overnight as Google tweaks its algorithms.

The secrecy of how Google calculates results does not leave searchers at its mercy entirely. Few web searches are aware of how to reduce Google's assumptions on behalf of users. For example, Google returns search results based on country, language, location, etc. As Google continues to increase its focus on personalizing search, it will continue to manipulate results based on what it thinks searchers are seeking. The "personal search history" is one way relevant results can be provided to a searcher based on previous interests and search habits.

Using Google Efficiently

To combat search interference, Google does make available advance search options. This link - found on the Google home page - allows a searcher to change the country, words, language, and other features of how results are displayed. Google Guide offers tutorials on improving search results.

Other resources for minimizing pollution of intent include:

The Basics of Search

Advanced Search

Preference and Advanced search settings

The Future of Google

Android

Open Social

Streetview

Data visualization - Gap Minder

Knol

Mobile Ad Sense

Google's non-GPS Map Location (see video)

Ethics and Privacy

Politics and Search: Censorship in China

Spam: Online search results in the mid-90’s could often yield surprising results. A search for “elearning”, for example, could display pages and images of an entirely different nature. Marketers and web site owners sought to influence search results through various means – including using misleading terms in the meta-information of a website (mislabelling the content of a site in order to bring in searchers). This approach was fairly common and based on the inherent weakness of sites focused on matching search queries to existing databases without applying some metric of value. As Google grew in popularity, individuals eager to influence search results started manipulating the linked nature of Google’s search by posting spam links in blogs, wikis, and other online sites. Google currently faces challenges in its ad-model (click spam, link farms).

Personal Data: Google's focus on delivering personal relevant data presents enormous challenges for security and privacy. While their motto is "do no evil", growing concern exists with the impact of Google blending public information with our personal information needs. How much do we want a corporation to know about our personal interests and habits?

Additional Resources

Google Guide: tutorial and references on Google

Flash tutorial on how Google works

10 Commandments of Google

The Church of Google

Information behaviour of the researcher of the future

The Googlization of Everything - a blog and information cite.

History of Search Engines

Who's Afraid of Google? - short article on increasing concerns of privacy in Google's use of personal information.

How Google pagerank works

Wikipedia Pagerank article

Personal tools