A Class on the Net for Librarians with Little or No Net Experience

LESSON 24: ARCHIE AND FTP FILE COMPRESSION

"Where in the world is Carmen Sandiego?"

-- Title of Educational TV Program


ARCHIE

Like me, you may have grown up with Archie and his friends, but don't forget, this is the 90s and the old gang has really changed! Actually, the Internet's Archie wasn't born until just a few years ago. He was helped into the world by three students at McGill University in Montreal. They were tired of slogging through hundreds of file libraries looking for "neat stuff," and so they decided to create a master directory that could keep track of where public files might be stored. They came up with Archie, which functions as both a catalog and an index of FTP sites.

Supposedly short for "Archive," Archie is a specific type of search program - in this case a file locator - that looks for files in anonymous FTP file libraries (archives accessible to the public) and housed at over 1200 sites around the world. Actually, Archie is a collection of servers that regularly talk to each other and pool their information - about 200 Gigabytes - into a huge, global database that is periodically updated.

Once a month, Archie searches nearly all of the anonymous FTP sites on the Internet and compiles a list of all the files. Then he sits and waits for you to query him. If you are looking for a specific file or group of files, Archie will point you to the appropriate FTP sites and give you the directory path so that you can find the files.


The Archie server will not retrieve the file for you (although some special purpose archie-based clients for Windows and Mac systems do allow you to actually retrieve the files, too), but he will respond with the exact location. You can then FTP to that remote host, retrieve the file using anonymous FTP, and transfer it to your computer. Although Archie also performs site searches and file description searches, practically all Archie searches are name searches. In name searches, you supply the name of the file or program you are looking for and Archie tells you where it is located.

Having Archie on the Internet is like having a library card catalog filled with title cards. You can search the entire Archie database by keyword and Archie will respond by coming up with file locations. Much like a call number, Archie will give you the host site(s) where the file is stored and the location of the file, i.e., the directories you will have to go to, using the cd (change directories) command to get to the file. Archie will also list the date of the file, so you can retrieve the most recent copy if you find duplicates at various sites.

To find out if your site supports an Archie client, type archie at your command prompt (or if you have a graphic interface -- search for it on your desktop) -- or ask your Internet support person. If you do not have access to an Archie client program, you can connect to Archie by telnetting to one of several Archie servers accessible via telnet around the U.S. and login as archie (on some systems, you may also be prompted to enter a password -- just use archie again). You can try the original Archie server at McGill University:

Just open your telnet program and plug in this address to access Archie.

Once you successfully connect to an Archie server, you can design your search parameters by entering one or more of the following basic Archie commands:

When you have entered your search commands, the Archie server will respond by telling you your position in line (your queue position) and the estimated time it will take to complete your search. When the search results are displayed, they will scroll down the screen without stopping. If you're working from an emulation package that provides a print screen key to print to your local printer, do so periodically as the screens scroll by to record your hits. When you're ready to leave Archie, type exit to quit.

FTP FILE SIZE

Before you try to locate public FTP files using Archie, let me say a word or two about file size. Always check the size of a file before you decide to transfer it. A file that takes only a second or two to transfer might take an hour or two to download to your workstation if you're dialing up to a shell account at 9600 baud or less. You should also know that you probably have a very limited amount of storage space on the system where your shell account resides (you can only store so much at a time). Ask yourself if you really need this file and if it isn't already available to you elsewhere.

For example, if you have access to a gopher client or to the World Wide Web, you can read the full texts of several recommended library guides directly from another computer host, so there's little reason to FTP copies to your own system. Remember, many of these guides and other documents are revised on a regular basis and you may find your local version from last month is already out-of-date.

Finally, if you plan to download files frequently from the Net, you should have a good anti-viral program in place. Chances are you will never have to deal with an infected file, but it's better to be safe than sorry.

FTP FILE COMPRESSION AND ARCHIVAL FORMATS

On the Internet, computer software and computer programs (binary files) must be transferred in binary mode in order to ensure that important information is not degraded, accidently lost or inappropriately translated (rendering the programs useless).

Because binary files can be very large, special utilities exist to "compress" them into more manageable packets: the smaller the files, the less storage space they take up on the FTP server, and the less time it takes for folks to retrieve them. Often, for ease of access, in addition to being compressed, software packages (multiple files) may be "archived" using other special utilities. These multi-file packages or "archives", again, are smaller and transfer faster -- and more efficiently, since you don't have to get multiple files. On FTP sites, you will find compressed files, archived files, and often times packages that have been both archived and compressed.

There are ways to tell if files on the Internet are stored in a special format. When transferring files stored at FTP host sites, look at the three letter extension at the end of each file name. The extension will indicate whether or not the files have been compressed and/or archived and, if so, which method of compression/archival was used. Here's a few of the more common extensions that should indicate to you these are no ordinary text files:


   .z          .tar
   .Z          .zip
   .gz         .zoo
   .sit        .arc
   .cpt        .lzh

If a file is both compressed and archived, two extensions will usually indicate this circumstance, e.g., filename.tar.Z tells you that the file was compressed using the Unix compress program, and archived using the UNIX archive utility tar.

To use files like this, once you have transferred them to your system, you need to uncompress/unarchive them. Just as the files themselves are available via FTP, so are the utility packages needed to uncompress and unarchive them (these are available for a variety of computer platforms -- Mac, DOS, Windows, etc., so be sure to choose the one that's right for you.) It's not enough to successfully uncompress and unarchive a file; you must also get it to work. Software written to run on a UNIX operating system, for example, will not run on a Macintosh, or on a DOS-based computer, or visa-versa.

If you have a graphic WWW browser, you can access a table describing extension types, and linking you directly to software to deal with each, at:
URL: http://www.matisse.net/files/formats.html

More assistance is also available at Eric Bennet's cross-platform page:
URL: http://x3066.resnet.cornell.edu/xplat/

You can also use Archie to locate appropriate utilities and, of course, you can always contact your Internet support staff person for assistance.

(NOTE: there are other types of encoding you may run into on FTP sites; the extensions .hqx and .uue are two you may see with frequency. Binhex (.hqx) and uuencode (.uue) files are encoded as special text versions of binary files so that they can safely be transported in ascii rather than binary mode -- most commonly, as files appended to email messages. Binhex is a popular utility for Mac systems, and uuencode is used in the Unix/Dos world. Binhexed and uuencoded files are actually larger than their binary versions, but they make for safe file transfers!)


YOUR ASSIGNMENT:

Telnet to an Archie site and, by keyword search, locate a file of interest. If you don't have a subject in mind, try this one along with me: Netiquette

There's a USENET file called _Emily Postnews Answers Your Questions on Netiquette_. I hear it's funny and I want to get it and read it. Those of you with access to the Web can click on the hot-text above, but let's assume I don't have that option. Using the keywords Emily Postnews, I'll ask Archie to help me find the file. I'll telnet to the Rutgers Archie server at: archie.rutgers.edu, and login as archie, giving my email address as the password. Then, these are the steps I'll follow:

  1. At the first Archie prompt, I'll type: set search sub
  2. At the second Archie prompt, I'll type: set maxhits 20
  3. At the third Archie prompt, I'll type: prog Emily Postnews
Success! I get lots of hits (some unrelated, some in German, some zipped). Following is a facsimile of the "good hits" I have to choose from to locate the file:


   Host ntuix.ntu.ac.sg    (155.69.1.5)
   Last updated 04:30 17 Mar 1997

    Location: /pub/faq/news.announce.newusers
      FILE    -r--r--r--   25592 bytes  12:29  8 Mar 1997  
      Emily_Postnews_Answers_Your_Questions_on_Netiquette


  Host gigaserv.uni-paderborn.de    (131.234.22.34)
  Last updated 15:26 11 Mar 1997

    Location: /ftp/disk5/faq/news/announce/newusers
      FILE    -r--r--r--   25618 bytes  15:45  1 Mar 1997  
      Emily_Postnews_Answers_Your_Questions_on_Netiquette


  Host freebsd.cdrom.com    (165.113.58.253)
  Last updated 15:09 10 Mar 1997

    Location: /.12/internet/rtfm/news/announce/newusers
      FILE    -r--r--r--   25618 bytes  15:45  1 Mar 1997  
      Emily_Postnews_Answers_Your_Questions_on_Netiquette

   (snip, snip ...)

I'll select a likely host, copy down the ftp host address and the file location, quit the Archie connection, FTP to the host in question, follow the ftp path, find the file and transfer it to my computer using the get command.

Now, don't be overwhelmed -- you'll see in upcoming lessons how WWW browsers make Net adventures easier for us all (yes, you can use Archie via the web!).

ARCHIE VIA EMAIL:

If you do not have access to the telnet function, you can retrieve information on searching Archie via email by sending a message to any of the Archie servers noted in this lesson (e.g., archie@archie.rutgers.edu) and typing the word "help" in the body of the message.

ARCHIE VIA GOPHER:

You can access Archie at numerous gopher sites, including the University of Minnesota: gopher://gopher.tc.umn.edu (go to "Internet file server (ftp) sites").

ARCHIE VIA WWW:

If you have a WWW browser, check out the CUI ArchiePlexForm interface:
URL: http://cuiwww.unige.ch/archieplexform.html


* "BCK2SKOL" is a free electronic library classroom created by Ellen Chamberlain, Head Librarian, University of South Carolina Beaufort, and Miriam Mitchell, Sr. Systems Analyst, USC Columbia. Additional support is provided by the Division of Libraries & Information Systems, University of South Carolina Columbia.


Your feedback and support for BCK2SKOL are appreciated; please email link updates, suggestions and comments to: eechambe@gwm.sc.edu

Return to BCK2SKOL Index

Go to Next Lesson

Links checked 9 March 1998. See the BCK2SKOL homepage for course update details.
Copyright © 2000, the Board of Trustees of the University of South Carolina.
URL: http://www.sc.edu/bck2skol/fall/lesson24.html