|

|
"As Obi-Wan sez: 'Use the web, Luke!'"
--Larry Masinter, Xerox Corporation, on the Net
|
Search engines are huge databases of web page
files that have been assembled automatically by machine.
There are two types of search engines:
- Individual. Individual search engines compile their own searchable databases
on the web.
- Meta. Metasearchers do not compile databases. Instead, they search
the databases of multiple sets of individual engines simultaneously
(see
Lesson 2).
Search engines compile their databases by employing "spiders" or
"robots" ("bots") to crawl through web space from link to
link, identifying and perusing pages. Sites with no links
to other pages may be missed by spiders altogether.
Once the spiders get to a web site, they typically index
most of the words on the
publicly available pages at the site.
Web page owners may
submit their URLs to search engines for "crawling" and
eventual inclusion in their databases.
Whenever you search the web using a search engine, you're asking
the engine to scan its index of sites and match your
keywords
and phrases with those in the texts of documents within the
engine's database.
It is important to remember that when you are using
a search engine, you are
NOT searching the entire web as it exists at this moment.
You are actually searching
a portion of the web, captured in a fixed index created
at an earlier date.
How much earlier? It's hard to say. Spiders regularly return
to the web pages they
index to look for changes. When changes occur, the index is updated
to reflect the new
information. However, the process of updating can take a while,
depending upon how often
the spiders make their rounds and then, how promptly the
information they gather
is added to the index. Until a page has been both "spidered"
AND "indexed," you won't
be able to access the new information.
NOTE: While most search engine indexes are not "up to the minute" current, they have partnered with specialized news databases
that are. For late breaking news, look for a "news" tab somewhere on the search
engine or directory page. Examples include:
PROS:
Search engines provide access to a fairly large portion
of the publicly
available pages on the Web, which itself is growing exponentially
(see
"How Big Is the Internet?")
Search engines are the best means devised yet for searching
the web. Stranded in the middle of this global electronic
library of information without either a card catalog or
any recognizable structure, how else are you going to find
what you're looking for?
CONS:
On the down side, the sheer number of words indexed by search engines
increases the likelihood that they will return hundreds of
thousands of
responses to simple search requests. Remember,
they will return lengthy documents in which your keyword
appears only once.
Additionally, many of these responses will be irrelevant to your search.
Search engines use selected software programs to search their
indexes for matching keywords
and phrases, presenting their findings to you in some
kind of relevance ranking.
Although software programs may be similar, no two search engines
are exactly the same in terms of size, speed and content; no two
search engines use exactly
the same ranking schemes, and not every search engine offers you
exactly the same search options.
Therefore, your search is going to be different on every engine
you use. The difference may not be a lot, but it could be
significant. Recent estimates
put search engine overlap at approximately 60 percent
and
unique content at around 40 percent.
In ranking web pages, search engines
follow a certain set of rules. These may vary from one engine to another. Their
goal, of course, is to return the most relevant pages at the top of their lists.
To do this, they look for the location and frequency of keywords and phrases in
the web page document and, sometimes, in the HTML META tags. They check out the
title field and scan the headers and text near the top of the document. Some of
them assess popularity by the number of links that are pointing to sites; the
more links, the greater the popularity, i.e., value of the page.
Search engines are best at finding unique keywords, phrases,
quotes, and
information buried in the full-text of web pages. Because they
index word
by word, search engines are also useful in retrieving tons of
documents.
If you want a wide range of responses to specific queries, use
a search engine.
NOTE: Today, the line between search engines and subject directories
(see
Lesson 3) is blurring. Search engines
no longer limit
themselves to a search mechanism alone. Across the Web, they are
partnering with subject directories, or creating their own
directories, and returning results
gathered from a variety of other guides and services as well.
ASSIGNMENT:
Select one of the search engines listed above and
search for:
Connecticut Compromise
Now try searching for the same subject as a phrase, enclosed in quotes:
"Connecticut Compromise"
[Note: The second search should retrieve far fewer documents than the
first search. More on this in Lesson 7]
Table of Contents
Metasearchers
[Table of Contents]
[Search Engines]
[Metasearchers]
[Subject Directories]
[Gateways & Databases]
[Evaluating Web Pages]
[Search Strategies]
[Search Tips]
[Boolean Operators]
[Field Searching]
[Troubleshooting]
[Ask]
[Clusty]
[Dogpile]
[GigaBlast]
[Google]
[MSN Search]
[Yahoo!]
[Graveyard]
[Final Exam]
[Beyond Bones]
[User Agreement]
Last updated by E. Chamberlain, Thursday September 07, 2006