This article was published in the Diktuon column of the ATLA Newsletter (May 1999).

The ATLAS Project

An Online Religion Journal Collection for Scholars and Teachers

James R. Adair Director, ATLA Center for Electronic Texts in Religion

On January 1, 1999, the American Theological Library Association created the new Center for Electronic Texts in Religion (CETR), based in Atlanta. The purpose of CETR is to disseminate electronic texts of interest to scholars of religion, to promote the publication of original scholarly works in formats compatible with online study and distribution, to support other efforts to move the academic study of religion into the information age, and to remain on the forefront of advances in technology through a commitment to research and development. CETR will plan, evaluate, and direct a variety of electronic projects dealing with the academic study of religion. The first project sponsored by CETR will be ATLAS, the ATLA Serials project. ATLA has submitted a grant proposal for the ATLAS project, and we are confident that it will be accepted.

ATLAS is designed to take 50 religion journals and 50 years' worth of volumes of each, for those that go back that far, digitize them, and make them accessible from the Web. In some cases, where a journal has been in existence for more than 50 years, ATLAS may work with the publisher to see about including the entire run of the journal. Having access to earlier scholarly research is important in a field like religion, where modern theories and approaches to the subject frequently build on, or react to, work that is decades old. Digitizing the full run of journals is one of the advantages that ATLAS will have over other journals projects, many of which offer only the last few years in electronic form.

Textual data can be digitized in two formats, encoded text and images, and ATLAS will digitize every journal in both formats. The idea for ATLAS developed out of earlier discussions about creating a digital archive for paper journals, and though our plan for the project has moved well beyond those earlier musings, the importance of having a digital archive has always remained one of the core components of the project. Many earlier digital projects in the humanities have created image archives of books and manuscripts, and they have experimented with different formats and resolutions. Based on the findings of several of these projects, the Library of Congress has created a World Wide Web style guide outlining their recommendations ( http://lcweb.loc.gov/webstyle/fileform.html). For printed textual matter, they suggest that the archival-quality images be in TIFF format at a resolution of between 200 and 400 dpi. While ATLAS was being planned, ATLA was also engaged in a pilot project involving the digitization of one volume each of five different journals, and our experiments with these journals led us to choose to digitize pages as 600 dpi TIFF images. At this resolution, even the smallest footnotes in older journals will be readable. This resolution is also sufficient to record illustrations, line drawings, and even photographic reproductions that are present in the journals we plan to digitize.

We will store the 600 dpi images of the journal pages in a database, thus creating a digital archive (the ATLAS archive will also include the fully encoded texts). A debate is currently raging in the archiving community over the question of whether a digital archive is really an archive at all, since most digital storage media degrade rather rapidly over time. Perhaps more significantly, technology is advancing so quickly that even if twenty-year-old media are still intact, they often cannot be read, because (1) no machines that are capable of reading the media still work and (2) software doesn't exist to convert from the older format to formats that are used today. The solution, many archivists say, is to continue using microfilm as the archival medium of choice. ATLA has been archiving journals on microfilm for fifty years and recognizes the advantages that microfilm continues to offer even now at the beginning of the digital age, so we plan to seek additional funding to preserve ATLAS journals on microfilm. However, we are also convinced of the viability of digital archives, and we consider the images and encoded text stored on electronic media to be just as important to archive as the microfilm. Unlike a microfilm archive, in which the camera master is stored in a vault, where it might remain for up to 500 years, a digital archive must be refreshed periodically to ensure that the media on which the images are stored are still valid, and the formats must be updated as new formatting standards are developed. A digital archive is thus somewhat more difficult to maintain, but it has a number of advantages as well: ability to produce as many perfect digital copies as needed without degradation of the original, much quicker copying time, and much lower storage costs. ATLAS journals, then, will be archived in two formats, microfilm and digitized text and images.

600 dpi TIFF images are a good archival format, but they are far from ideal as a display format. In the first place, TIFF is a lossless format, so it is not particularly efficient at storing images with large amounts of contiguous black or white space (common in printed texts), and the time it would take to download a large TIFF image over a modem connection would be daunting. Second, no computer monitors currently in popular use are able to display an image in anything approaching 600 dpi; a typical screen resolution is 72 dpi. Therefore, it is necessary to convert the TIFF images to a lower resolution, compressed (lossy) format for delivery over the Web. The two most commonly used formats on the Web are GIF and JPEG (also called JFIF), both of which are viable presentation formats. Our current plans are to display most page images as 100 dpi GIF images, but we will consider using JPEG images for pages with color photographic reproductions if the JPEG image is appreciably more compact than the corresponding GIF image.

Having online access to page images of journal articles would be beneficial to scholars and teachers, but the images themselves are insufficient unless specific pages can be accessed rapidly and unless some form of searching is possible. In the SELA journals project, a joint effort of Scholars Press and the Emory University Libraries, we experimented with different ways of displaying page images, and we settled on a method that involved associating each page image with an SGML "envelope" that contained a limited amount of information about the image (most importantly journal title and page number). We used the Ebind DTD developed at Berkeley to create valid SGML documents that allowed readers to look at the table of contents of the issue of a journal, select an article, then download the first page of the article and begin reading. Readers who already knew the specific page they were interested in could go directly to it (see http://shemesh.scholar.emory.edu/cgi-bin/Ebind2html/1/BA60.2 for an example).

Having access to the table of contents and to individual pages is good, but being able to search by author, title, and a variety of subject fields would be better. For ATLAS, we plan to modify the Ebind DTD to include a substantial amount of metadata, in a format compatible with USMARC, with each page image (we will also transform it into a valid XML DTD). Furthermore, we will use the ATLA Religion Database (RDB) as a model for creating the front-end of our ATLAS search engine. Since all ATLAS journals are indexed in the RDB, the metadata for each article already exists in electronic form. By associating this metadata with the page images, we will create a powerful search tool that will a boon to scholars and teachers alike.

The XML-encapsulated page images will be the first pages that we make available online, for two reasons. First, producing them is a relatively quick process, compared with fully encoded XML text, so we will be able to provide access to numerous journals fairly quickly. Second, even after the fully encoded XML texts are available, scholars will undoubtedly find errors in the encoded texts. Having access to the page images will allow them to determine whether the errors were originally present in print or whether they were introduced in the process of converting from print to electronic format. In the latter case, we will make the necessary corrections to the encoded text. In the former case, we will preserve both the original and the corrected forms of the text.

As soon as the page images are ready, the journals will be encoded in XML, probably in a DTD related to the SGML Text Encoding Initiative (TEI) DTD, though the specific DTD has yet to be determined. The fully encoded text will of course contain the same metadata as the encapsulated page images, so searching over a variety of fields will be possible, but full-text searching (including sophisticated Boolean searches, proximity searches, and searches based on the XML encoding) will be an added bonus. The ATLAS search engine will be the most powerful and useful tool of this sort available to religion scholars, allowing them to search the collection for individual words (or parts of words), subjects, or scripture references, in many different combinations.

In the SELA project, we made use of a server that transformed SGML to HTML on the fly, thus allowing readers to view the documents using ordinary Web browsers. While the short-term use of on-the-fly XML to HTML conversion is a possibility for the early stages of ATLAS implementation, we expect that XML servers and browsers will be available for scholars to use as early as 1999, and we hope to be able to avoid the extra complications necessary with conversion.

The biggest technical question in terms of display will revolve around the issue of the proper display of languages like Hebrew and Arabic, which are written right-to-left . If the XML browsers that become available fully conform to the XML standard, they will be fully Unicode compliant as well. On one level, Unicode is a character encoding scheme that uses two bytes (sixteen bits) to represent each distinct character, unlike ASCII, which uses one byte (eight bits). Whereas only 256 characters can be represented in ASCII, 65,536 can be represented in Unicode. Intermingling English text with Greek and Hebrew, for example, in ASCII requires the use of multiple fonts, since more than 256 characters are required to display all the letters, numerals, diacritical marks, punctuation, and special characters in these three languages. In Unicode, however, each distinct script has its own block of code points. So, for example, Western European languages can be represented by the 256 standard ASCII and Extended ASCII (also called Latin 1) characters, and Greek has its own block of characters, as do Hebrew (also used for Aramaic and Yiddish), Arabic, and Hindi (the special considerations for dealing with Chinese, Japanese, and Korean are not considered here--see the Unicode Standard, version 2.0). Reserving a block of characters for each script is not all Unicode does. It also defines which direction scripts run (directionality, e.g., left-to-right, right-to-left) and how characters should be displayed when surrounded by certain other characters (contextual characters, e.g., a final sigma in Greek or a medial nun in Syriac). Fully Unicode compliant XML browsers will solve these display problems that have haunted HTML for years (for a fuller discussion of the problems and various solutions to displaying multilingual documents on the Web, see my article "TC: A Journal of Biblical Textual Criticism: A Modern Experiment in Studying the Ancients," Journal of Electronic Publishing 3 [1997]; URL: http://www.press.umich.edu:80/jep/03-01/TC.html).

The specific journals that will be included in the ATLAS project will be selected by an Advisory Board, in conjunction with ATLA and CETR staff. Raymond Williams of the Wabash Center for Teaching and Learning has agreed to help us assemble a team of religion scholars to assist us in choosing the fifty journals (more or less) to include in ATLAS. Because of agreements we have already made with publishers, two sets of journals, published by Scholars Press and Sheffield Academic Press, will be included among the ATLAS journals (a total of about fifteen). One interesting aspect of journal publishing in the field of religion is that unlike fields such as science and medicine, where a single publisher may publish dozens or even hundreds of different journals, only a handful of publishers publish as many as five religion journals. Most journals are published by seminaries, religion departments, consortia, or non-profit organizations whose only serial publication is that one journal. The majority of ATLAS journals will come from "publishers" that publish a single title. ATLA indexes almost 600 journals in the RDB, and they may be divided into six broad categories: (1) Bible, Archaeology, and Antiquities; (2) Theology, Philosophy, and Ethics; (3) Religions and Religious Studies; (4) Pastoral Ministry; (5) History, Missions, and Ecumenism; and (6) Human Culture and Society. Each of these areas of study will be represented in ATLAS, and scholars whose expertise lies in each of these fields will be included on the Advisory Board that will assist with journal selection.

ATLA does not currently index electronic journals in the RDB, but it plans to begin indexing selected e-journals in early 2000. ATLAS will integrate the e-journals indexed by the RDB into its database, and since digitization is not an issue, the inclusion of e-journals should be relatively straightforward. Two issues will need to be addressed, however: archiving and varied HTML formatting. It is possible, of course, simply to archive the HTML format, but HTML does not allow the richness of markup possible in more sophisticated XML DTDs, so it is not an ideal archiving format. Furthermore, as XML browsers become widely accessible, many e-journals will begin to make the transition from HTML to XML in order to take advantage of its many powerful features, including increased metadata capability, enhanced encoding possibilities, improved display and linking mechanisms, and Unicode support. Migrating from HTML to XML is one step in the direction of determining an archiving format, but unless some consistency in encoding among various e-journals can be achieved, the e-journals included in ATLAS will not be as usable as the print journals. To address the problem of varied HTML formatting among e-journals, ATLAS staff will work with e-journal publishers and consortia like the Association of Peer-Reviewed Electronic Journals in Religion (http://purl.org/apejr) in order to develop an encoding scheme that will be viable for individual e-journals and for ATLAS itself.

ATLAS is a three year project, at the end of which period approximately 50 journals will be available electronically. A second phase of ATLAS will begin digitizing the other 500+ journals indexed in the RDB. It is probable that we will seek additional funding to accelerate the digitization of more journals, but it is our intention for ATLAS to be self-sustaining, so we will have to charge enough for access to meet our budgetary needs, which include the digitization of both new issues of current ATLAS journals and a limited number of new journals. Since the primary purpose of ATLAS is to assist scholars, teachers, and students involved in the academic study of religion, we will offer access to ATLAS journals to theology libraries and faculty members, as well as to public libraries, students, and independent scholars. We plan to cover our ongoing costs by reaching a large number of people and institutions rather than by charging a large amount of money for access. We believe that the fees we assess for access to ATLAS journals will be far lower than those charged by comparable online journal collections.

The ATLAS online religion journal collection is a project that is being created for religion scholars by religion scholars. It is our hope that everyone interested in the serious study of religion--whether teachers, students, independent scholars, clergy, or laity--will benefit from our efforts. When the breadth of journal coverage, power of the RDB-based search interface, and ease of access are considered, we believe that scholars will find in ATLAS a valuable tool for research and teaching.