Format Fun, Frolic, and Facts

By

William Dan Terry
Director of Technology, NetPubs International

IP News (Internet Edition) Winter 1997

This article is the debut of a new IP News department called "Tech Talk." The purpose of this department is to address aspects of Internet technology which should be considered in the creation and evolution of an Internet publishing program. Topics covered will range from relevant core Internet technology to new developments and proposed future technologies. The Tech Talk department will appear bimonthly in the Internet Edition starting in February. Tech Talk welcomes suggestions for topics and submissions.

An Internet publishing program has a wealth of technologies available to present titles and support the readership, and new options are constantly appearing. These technologies correspond to communications and information formats. Each of these formats is designed for a specific type of use, some broad, some narrow, and each has associated strengths and weaknesses. These form the building blocks for an Internet publishing program. Which blocks are used and how they are put together depend upon the goal of the publishing program, the types of information presented, and the audience's interest. This article's goal is not to make recommendations, but to present the possible uses and benefits of some of the major building blocks.

On the Internet there are a variety of defined task levels which make up the network's communications. These relate to the level of generality of the task. Like a Russian nested doll, each layer is built upon the inner layer and only contacts the one layer inside and one layer outside.

At the core the link layer provides the physical wiring and software to read and write signals over the wiring. Next the network layer handles addressing and routing of individual, generic data packets. It has no understanding of what is in the data packets nor of the relationship between data packets. The transport layer handles the communications between applications on the network. It also has no understanding of what is in the data packets, but it does know the relationship between packets and performs error detection and correction among other functions. The application layer is responsible for specific data communications formats and end-user interaction.

For instance, Internet Protocol, or IP, is responsible for routing and delivering packages of generic data at the network level. Higher up at the application layer, protocols like HyperText Transfer Protocol (HTTP), the protocol of the World Wide Web, are responsible for the high-level communications and interpretation of higher-level data formats, like hypertext data, or Web pages. HTTP relies on IP through the use of a transport level protocol as a courier service to send and retrieve packages of hypertext data, but HTTP is responsible for preparing the data on the sending side and interpreting the data for display on the receiving side.

This article will only concern itself with the primary, applications level protocols used for the transfer and presentation of data for human readers.

Most of these high-level protocols employ two types of software applications; servers and clients. Servers generally reside where the corpus of data is located and operate autonomously, responding to requests from clients. Clients are used by humans to communicate with the servers in order to access the corpus of data.

WebWorks

The current cornerstone of an Internet publishing program is the World Wide Web, a highly graphical, friendly and intuitive environment. WWW is built around HTTP which provides the means to incorporate a large variety of data formats. Not only can some standard multimedia formats be handled directly, but those which can't be handled directly can be passed on to "helper" applications which will process the data appropriately.

The other major benefit of the Web is that it is a hypertext medium. This means that words, groups of words, and images are linked to other parts of the same document and to other documents residing elsewhere on the Internet. This virtual web of information allows for the easy sharing and referencing of individually maintained information sources. Navigation of the Web is known as surfing because one starts with the Web address (Uniform Resource Locator - URL) of a Web document of interest. One then follows hypertext links to other Web documents from which the cycle is repeated until the user has obtained the information sought.

There are a number of Web catalogs, or indexes, which provide searchable interfaces so that users can more easily locate Web documents of interest. These provide good starting points for manual searching, but generally do not provide search results based upon full document search.

WWW is used both for the presentation of publications and the Internet presence of the publisher or organization. An Internet presence presents a place where people, world-wide, can learn more about a publisher's or organization's services and products. As the use of the Internet rockets, the utility of a place where people can peruse such information becomes increasingly useful. Ads for publications and organizations can be small, attention-getters which provide the URL for the Internet presence. People interested in more information can easily obtain much more information than could ever be placed in an ad. And this can be done at a relatively low, fixed cost to the publisher or organization regardless of the number of people who seek the information.

The publication is the heart of a publishing program. The Web's graphical nature provides an easy way to present it. The two primary document formats for Web publication are HyperText Markup Language (HTML) and Portable Document Format (PDF). There are other document formats like PostScript and Standard Generalized Markup Language (SGML). PostScript is being replaced by PDF in capability and popularity. SGML hasn't been used widely on the Web yet.

HTML is a subset of SGML and is the official format for Web documents. It is what Web clients, called browsers, display. HTML documents contain formatting tags which define the roles of pieces of text, e.g. Header1, Table, List Item. Web browser settings determine how these elements are displayed. Because of this, it is not possible to define the exact layout and appearance of HTML documents. This is a benefit for the reader. Since HTML documents are geared for use on a computer, the reader can optimize the viewing window size, font size and other display aspects for the computer screen used and according to personal preferences.

HTML pages can be of any length, but it is generally best to not put too much material in one page. For instance, a 50-page journal composed of 15 articles would be very cumbersome as a single HTML page. Whereas, a two-page newsletter composed of two articles would be more cumbersome if split into two single-article HTML pages and a table of contents. Where the crossover occurs depends upon the publication, the material, and the audience's preferences and uses.

Each HTML page is retrieved from the source upon the request of the reader. This means that the 50-page, 15-article journal with one article per HTML page would be retrieved one article at a time as the reader chooses what to read, probably by clicking on the name of the article in a Table of Contents page. This requires that the Internet connection be maintained. For publications which contain links to other information this would be necessary anyway.

Images in HTML pages are actually handled as references to separate graphics. They are not actually imbedded in the document. The benefit of this is for images which are used in multiple places on a page or on different pages. Only one copy of the image needs to be retrieved for it to be available to the publication. The disadvantage is that the saving of the HTML document does not save the image. It only saves the reference to the image.

PDF maintains the appearance of a full desktop published document just as it would print. In addition, PDF documents can contain links to other parts of the document and URLs, continuing the hypertext Internet model. PDF is not directly displayed by Web browsers at this time, but a PDF viewer can be configured as a Web helper application which will automatically launch when the Web browser retrieves a PDF document.

A journal published in PDF is self contained. The entire publication is retrieved at one time and can be viewed while the computer is disconnected from the Internet. However, to pursue links to other information external to the publication still requires the Internet connection.

Gopher Goods

Gopher is designed to provide a virtual directory structure which is, in actuality, composed of parts distributed around the Internet. To the user, gopherspace appears like one huge hierarchical structure with documents residing at various places within. To the presenters of the documents, it is a large semi-collaborative effort, where each presenter is only responsible for its piece of gopherspace. Navigation is accomplished by choosing a starting point with a gopher client and moving up and down in the directory structure until the information sought is found.

Historically, gopher precedes WWW and has been largely replaced by WWW. Some gopher servers are still getting a lot of use, but few new gopher sites are being set up and some material available via gopher has been moved to WWW. In addition, Web clients are capable of working with the gopher protocol.

File Foraging

File Transfer Protocol, or FTP, is an early Internet protocol which provides a way to deliver and retrieve files. These files can be anything, including text documents, graphics, or software. FTP requires access permission to participate. Many FTP servers directed towards public access offer a special user called "anonymous" which doesn't require a password (though it's customary to enter your email address as the password out of courtesy). FTP sites are stand alone and their locations are passed primarily via means other than FTP, such as email among colleagues. Finding particular information without prior knowledge of its location and availability is performed by hunt and peck.

FTP is an excellent tool for making secondary information available in conjunction with a title presented via WWW. The WWW protocol provides a way to incorporate the retrieval of a selected document as a link in a Web page. For instance, an article based upon research data can contain a link to the raw data located at an FTP site for those readers who want to wade through the data. This way the data doesn't clutter the Web pages. And FTP is more efficient at transferring particularly large data files than WWW is at transferring the data in Web page form. Also, the predicted amount of use of the data by readers might not warrant converting the raw data to the Web format, though making it available to those readers who might be interested is still desired.

Another use for FTP in a publishing program is as a means for readers to exchange large files among themselves. This could include the passing of raw data sets among colleagues for opinions, passing special software, etc.

Ongoing Discussions

Email discussion lists utilize Internet email (Simple Mail Transfer Protocol - SMTP) to allow a group of people to partake in conversations on multiple concurrent topics, generally denoted in the email subject line. There is always a focus subject for the topics, although discussion lists do vary as to whether and how tangential or unrelated topics will be tolerated.

Due to the nature of email the participants don't need to be present at any particular time to be involved. They need only to respond to emails from the group at their leisure, though hopefully while the topic is still being discussed by others.

The basic list structure involves a software application which operates as a group mail reflector; an email message sent to the mail reflector is reflected to the entire group. This way any message a member of the group wishes to send to the entire group is as simple as mailing it to one recipient. The software is responsible for maintaining the list of participants.

Discussion lists can be either open or closed. Open discussion lists allow anyone to join using a set of defined commands in the contents of an email to a special email address reserved for list requests. Closed lists require the approval of the listmaster to participate.

These lists can be either moderated or unmoderated. Moderated lists route all submitted emails to the listmaster for approval and editing. The listmaster then submits the approved emails to the reflector. Unmoderated lists route submitted emails directly to the reflector.

Discussion lists can be either in real-time or digest form, and both of these options can be available at the same time with participants choosing which form to use. Real-time form reflects submitted emails as they are submitted and is useful for those who want to participate as they receive the individual emails. Digest form stores the day's submissions into a daily digest which is distributed first thing the following day and is useful for those who want to spend most of their time monitoring the discussion while making few submissions or for those who follow the subject but don't want the email traffic to interfere with other daily email.

Subscribers can find much use in a discussion list associated with a publication. It provides an immediate capability to meet and continually interact with colleagues around the world at a fraction of the cost of attending a conference. For the publisher or organization, an unmoderated discussion list opens a window into what the readership feels are the hot topics for discussion. This can be useful when deciding what material to put in the next issue of the publication.

Using the closed version, the publisher furnishes an added value to the subscription. On the other hand, an open list might attract participants who are not subscribers. As topics will invariably cover articles in the publication, these non-subscribers may decide that it behooves them to become subscribers.

Emperor's News Threads

Internet News has some similarities to the email discussion list. It provides a forum for discussions related to specific subjects, called news groups. It allows multiple, simultaneous topics of discussion, called threads, within a news group.

Unlike email discussion lists, it is not email based, rather it uses Network News Transfer Protocol (NNTP) for communications and works more like a bulletin board. Readers submit their comments, which are posted to the bulletin board under the label of the news group and anyone can read it. There is no membership to the news groups. Any passer-by can tune into the discussion.

Also unlike email discussion lists, the reader only gets a list of available messages in a news group. Only upon the reader's request is a message actually retrieved.

As with email discussion lists, individual news groups can be moderated or unmoderated.

News groups tend to be shared in that many news servers will carry news groups originating elsewhere. It is possible to have a dedicated news server solely for news groups originating at the server.

Chatterbox

Internet Relay Chat, or IRC, also provides a vehicle for discussions, but is more like a cocktail party than email discussion lists or network news. People must be present in the "room" to participate. Threads must be followed mentally as they are not identified via other means.

This real-time format is an excellent way to have special guests for focused discussions. An hour with the editor or the author of an article can provide the readership with a forum found at conferences without making travel arrangements.

Chat is generally unmoderated, but special sessions with a guest can be moderated to keep the discussion focused.

Construction Time

There are other building blocks available and more are always being created. The blocks just discussed provide the current, main types of blocks for building any publishing program from a cozy cottage to a plush palace.

One example of a good first step in an Internet publishing program would be to present a parallel Internet version of a print title. This would be accomplished by creating an Internet presence for a title with a Web page about the publication. This Web page would have links to the parallel Internet version of each print issue. Some issues could be made publicly available as samples available to over 35 million people world-wide at no extra cost.

A more aggressive program could offer pre-prints, in-between issue article follow-ups, and even longer, more detailed articles.

A more comprehensive program could include an email discussion list for subscribers, a one-hour chat session with the author of the lead article of each issue, and an FTP site where readers can obtain the full research data behind the articles.

A powerful Internet publishing program will take advantage of the variety of communications options available, creating both a greater value for the title and a closer community among subscribers and with the publisher or organization.

IP News Winter 1997 Table of Contents IP News Title Page
http://www.lodestonesystems.com/doc/IPNews.html ©1997, NetPubs International