A Class on the Net for Librarians with Little or No Net Experience
LESSON 27: WWW, PART 2: THE URL AND SEARCH ENGINES
"The Web is like the game of Othello. Remember?
'a moment to learn, a lifetime to master.'"
-- Lou Rosenfeld, University of Michigan, SLIS
HOME PAGES AND HTML
"Home page" is the friendly, non-threatening term for the
special document used to serve information systems on the Web.
The home page for a WWW server, or for a special collection on a
server, is roughly analogous to a gopher root menu or a magazine's
table of contents. Usually featuring a flashy identifying graphic
or logo, the home page provides links to the collection of online
materials the site distributes. The home page for each site is
both the welcome mat and the front door to its collections; a
well-designed home page will include a mailing address for your
comments on the site, information on the person who maintains the
page, and perhaps a search engine against the materials stored
on the site. Every major university, commercial enterprise,
corporation, and government agency on the Web provides a home
page to make your visit to their site worthwhile and informative.
Increasingly, individuals are designing their own home pages and
serving them on the Web. In fact, the personal home page has
become such an electronic status symbol, that many commercial
Internet providers now also offer to serve the personal home
pages of their subscribers as an additional service (usually for an
additional fee, of course!) Folks with a direct Internet
connection, and the know-how to run a WWW server package, can
serve their own home page directly from their own computer.
Additionally, anyone creating WWW pages needs to have a basic
knowledge of HTML (Hypertext Markup Language). This markup
language, which is really just a set of command tags that
indicate how to format a page (first level heading, hot link to
here, insert graphic, begin a bulleted list, etc.) is to the Web
what Pagemaker is to desktop publishing. In "Understanding and
Exploring the Internet," _Popular Mechanics_, April 1995, Senior
Correspondent Abe Dane describes how pages composed in HMTL work:
A Web Browser runs on your computer and acts as a graphical
interface between you and the Web. When you click on a link,
it issues the necessary commands to request data from other
computers, then interprets whatever comes back. Documents
written in the standard Hypertext Markup Language (HTML)
contain codes that tell Web browsers what typeface to use and
how to format it. Most browsers can also display digitized
pictures, so that a well-written HTML document appears as an
illustrated page.
HTML supports the creation of many features, including headings,
highlighting, links to other pages, etc. The HTML behind Web
pages allows them to support such features as in-line graphics
(pictures embedded in the text as they might be in a desktop
publishing article), in-line fill-out forms (for database
searching, surveys, user-feedback, ordering information, etc.),
"hot" maps (where you can click on highlighted portions of a map
or graphic to go to that spot -- as you might do in a computer
game), links to other documents, and other headings in the
current document, etc.
URLs
URL stands for Uniform or Universal Resource Locator and it
acts as a unique global Internet identifier. Since 1991, URLs
have been used as the standard way to locate and to cite
multimedia resources on the Web and virtually everything else
on the Internet -- links to virtually every Internet protocol
are also supported (e.g., gopher, wais, telnet, ftp, http,
usenet news, mail, etc.).
URLs are directions to information accessible via the Web. At
first, the format of URLs will look different to you from the
access addresses we've covered so far for telnet, ftp, and gopher
connections; for instance, they can be a *lot* longer. But, if
you look at them closely, you'll see familiar Internet "connection
goings-on" at work. One thing to keep in mind at all times is
that URLs are case-sensitive to the extreme, and like other
Internet addresses, every character counts (miss one capital
letter, or a dot or a slash) and you won't connect.
Each URL describes a particular path to a specific Internet
resource. For example, here's a typical URL:
http://www.sc.edu/beaufort/library/index.html
Let's take a look at the basic parts of this Web address:
- The part before the colon:
In the example given above, http (hypertext transfer protocol)
tells you that the file is being served from an http, or web, server.
Since other protocols are also supported, you will also
see URLs that begin with "gopher", "telnet", "ftp", etc. which
provide links to resources on these types of servers.
-
The part that follows the two forward slashes (//):
This gives you the hostname, or Internet address, where the
resource is located. In this example, the information is on
the server "www.sc.edu".
- The remainder of the URL (following the first "/"):
The remainder describes the directory path to the particular
resource (sometimes you must navigate through multiple directories
to reach a resource -- remember our FTP travels?), and ultimately the
name of the file. Both the directory names and the file name
may be *long* and complicated, although not in this case. In our
example, the directory path is simply "beaufort/library/," these being
the directories created for my campus and library on
the www.sc.edu server. The filename for my library home page is "index.html"; the "html"
extension is used for HTML files (you may see the three-character
extension ".htm" used for files being served from a
Windows web server). You may see other filetypes, as well:
"au" is a sound format, "gif" is a graphic, etc.
The URL in our example just happens to be the home page for my
library at the University of South Carolina Beaufort. I have been
allotted space on the "www.sc.edu" server under "beaufort/library" and that
is where I stash my files. When you visit my home page, you will
find many links to other pages and sources on the Web. If you wish
to visit any of them, you have the option of accessing them via my
library home page or noting the URLs and later, jumping directly to
them on your own. URLs are useful, when travelling on the Web, to
get you quickly to where you want to go. Just like in real travel
adventures, you won't be at the mercy of road signs if you have
specific directions in hand.
And, don't forget to save those directions. Most Web browsers,
as noted in our last lesson, let you create a list of hotlinks
or bookmarks to make your return trips to sites a "no-brainer".
However, what about the times when you don't have a URL in hand
and you don't have any idea how to get to where you want to go
on the Web. What do you do then?
WEB SEARCHERS AND INDEXES
One of the best ways to explore the Web is to use a Web search
tool, or what's called an "intelligent Web agent." These are
software programs that roam the Web creating databases for you
to search. Each works differently and you may want to try more
than one. Please note that most of these search tools search
against the *FULL TEXT* of indexed documents, and not just
against document titles as did the gopher search tools in
earlier lessons. You may elect to try a subject search using
one or more of the strong WWW subject tree collections or indexes
that are organized into subject categories, such as the following:
Web Subject Directories
- Argus Clearinghouse: Internet Research Library
Arranged by academic subject with links to Virtual Libraries, directories,
and search tools, at:
URL:
http://www.clearinghouse.net/
- The EINet Galaxy
Exhaustive directory of Internet resources, organized under large,
general subject areas, that's searchable, at:
URL:
http://www.einet.net/galaxy.html
- Magellan
Web navigational and informational directory, containing listings for
approximately 4 million sites, many of which are reviewed and rated, at:
URL:
http://www.mckinley.com
- The WWW Virtual Library
Comprehensive subject catalogue of Web sources put together by the
creators of the World Wide Web, at:
URL:
http://www.w3.org/pub/DataSources/bySubject/Overview.html
- Starting Point
Web sites organized in the following categories: business, computing,
education, entertainment, investing, magazines, news, reference, shopping,
sports, travel, and weather.
URL: http://www.stpt.com/
- Yahoo
Massive index organized into subject categories, and including a search
tool. You'll find hours of entertaining web browsing via Yahoo at:
URL: http://www.yahoo.com/
Web Search Tools
Or, you may prefer to go directly to one of the powerful search
engines operating on the World Wide Web:
-
Alta Vista -- a Web index, which also contains full-text access to over
13,000 news groups, at:
URL:
http://www.altavista.digital.com/
search tips: link related words into phrase by enclosing in
quotes, e.g., "higher education"; search in lower-case to
match capitalized words as well; search in capital letters to
force an exact match; precede required search terms with a +,
prohibited ones with a -, e.g., +noir +film -"pinot noir";
use * as a wildcard to catch variety of word endings, e.g.,
librar* for library, librarian, libraries, etc.; place important
subject words first in string when performing simple searches.
-
Hotbot -- Wired's search engine searches the full-text of over
50 million documents in its
database, finds all documents that fit your criteria, sorts them according
to their fit vs. your search terms, and returns documents as a list of
abstracts and hyperlinks:
URL: http://www.hotbot.com
search tips: allows menu driven boolean searches (select "all the
words" or "any of the words," etc.); a "modify" panel is also featured to
let you restrict your searches by date, location, and media type (documents
containing audio files or images, for example). A full-blown advanced
search interface is also included, and you can save your modified settings
to create a personalized hotbot tool.
-
Excite -- a database of over 50,000 web site reviews, providing both
keyword and concept searching capabilities and supporting full
Boolean operators and syntax, at:
URL: http://www.excite.com/
search tips: to give extra weight to certain words, repeat them,
e.g., "President Clinton's policy on Bosnia Bosnia Bosnia"; if
unsure of proper spelling, enter multiple versions, e.g.,
Khaddafi Quadafy Kaddafi Quadaffi; to fine-tune your searches,
use + in front of a search word to require it, e.g., hockey +NHL;
use - to exclude words, e.g., Jaguar -car -automobiles.
-
InfoSeek -- a powerful search engine that can sweep the entire Web
or limit searches to various categories, e.g., Usenet newsgroups,
Web FAQx, Select Sites, etc, at:
URL: http://www2.infoseek.com/
search tips: capitalize proper names, e.g., Babe Ruth; to search
for separate names, use commas to separate them, e.g., Babe Ruth,
Boston Red Sox; use double quote marks around words that must
appear next to each other, e.g., "stupid pet tricks"; use hyphens
between words that must appear within one word of each other,
e.g., cable-networks; use brackets to find words that appear
within 100 words of each other, or that you'd expect to see in
the same sentence or paragraph, e.g., {elevator safety}; use +
in front of words that must appear in document, - in front of
words that must not, e.g., city guide +San Francisco; python
-monty.
- Lycos -- a fast-action Web spider that offers a new subject
directory to help focus your search, at:
URL: http://www.lycos.com
search tips: customize search options and display options to
fit your search specifications; change search default option
from <or> to <and> to accommodate critical word strings, e.g.,
"peanut butter", rather than "peanut" or "butter". Lycos does
not support Boolean searches, numbers at the beginning of words,
or use of the + symbol. Lycos does support use of the - to
exclude words, use of the . at the end of keywords to exclude
various word endings, and use of the $ as a wildcard to catch
various word endings.
- Webcrawler -- an easy-to-use search engine that supports both
"natural language searching" and Boolean search operators, at:
URL: http://webcrawler.com
search tips: customize display of search results by selecting
either title or summary display; go to "Links" to find out
who's linking to your home page; go to "WebFacts" for latest
statistics on Web use; perform advanced searching with these
operators: AND, OR, NOT, NEAR, ADJ, quotations and parentheses.
- World Wide Web Worm -- a comprehensive Web search engine that
links to citation hypertext, citation addresses (URL), HTML titles and
HTML addresses, at:
URL:
http://www.cs.colorado.edu/home/mcbryan/WWWW.html
search tips: select any one of the four search capabilities;
specify search by giving a list of keywords separated by spaces;
use keywords with at least three characters; specify operators
AND or OR to customize search.
Unfortunately, those of you without "forms capability" software are limited
to "no-forms" searching, the lowest level of searching and not uniformly
available across the Web.
Lastly, here are two search interfaces you'll want to bookmark:
- The All-in-One search page will link you to 120 or so WWW
search tools and directories, plus hundreds of other
specialized search tools and search interfaces (people lookups,
school and company lookups, publications, etc.) via a common
interface. Go to the All-in-One Search Page at:
URL: http://www.albany.net/allinone/
- Search.com will link you to over 250 ways to search the Net, and
allows you to create a personalized interface to their collection:
URL: http://www.search.com
For a more lengthy look at available search engines, and a link to
articles comparing the various engines, go to WebReference.com's
Search page:
URL: http://webreference.com/search.html
YOUR ASSIGNMENT
- Check out the Web searchers and Web indexes; try some searches --
you'll be hooked!
- Figure out how to set bookmarks from your web browser.
- Use your WWW browser to make other types of internet connections:
Finally, for a basic discussion of different types of Internet
connections possible via the Web, open David Baker's Guide to URLs:
http://www.netspace.org/users/dwb/url-guide.html
READ MORE ABOUT IT
Here's a text that's been highly recommended by a number of folks
on the WEB4LIB discussion list as one of the best books around on
creating HTML pages:
Lemay, Laura. _Teach Yourself Web Publishing with HTML in a
Week_. SAMS Publishing, 1995.
Also, check out:
Taylor, Dave. _Creating Cool Web Pages With HTML_. IDG Books, 1995.
 |
"BCK2SKOL" is a free electronic
library
classroom created by Ellen Chamberlain, Head Librarian, University of South
Carolina
Beaufort, and Miriam Mitchell, Sr. Systems Analyst, USC Columbia. Additional
support
is provided by the Division of Libraries & Information Systems, University of
South
Carolina Columbia. |
Your feedback and support for BCK2SKOL are appreciated; please email link
updates,
suggestions and comments to:
eechambe@gwm.sc.edu
Return to BCK2SKOL
Index
Go to Next Lesson
Links checked 7 January 1999. See the BCK2SKOL homepage for course
update details.
Copyright © 2000, the Board of
Trustees of the University of South Carolina.
URL: http://www.sc.edu/bck2skol/fall/lesson27.html