"What are interfaces for, really?" by Gabriel Egan
[This paper is abstracted from a larger essay of 8,000 words offered to the forthcoming book Shakespeare and Interface. That essay begins by sketching the invention of the stored-program computer by John von Neumann, the resulting context-sensitivity of all binary data in memory, and the relevance for this for the accreted layers of annotation in digital texts. There follows a brief history of human-computer interfaces from the batch work of the punch-card era, through the invention of teleprocessing in the 1960s, to the widespread availability of personal computers from the 1980s. We pick up the story with an evaluation of four digital products for the personal computer that are widely used by Shakespeare scholars: The Oxford Complete Works of Shakespeare Electronic Edition (1989), Literature Online (LION, 1998-), the Henslowe-Alleyn Digitization Project (2005-), and the Database of Early English Playbooks (DEEP, 2007-). The short answer to this essay's title is that interfaces are for enacting the power relations between users and content providers and the four products are here evaluated in relation to that answer.]
Two of our illustrative examples are commercial products, the Oxford Complete Works of Shakespeare and Literature Online, and require a purchase or subscription, and two are publicly funded and are free at the point of use. Our first example, the Oxford Complete Works of Shakespeare Electronic Edition (1989), predates the widespread adoption of Graphical User Interfaces in personal computing, coming as it does from the MS-DOS era of 1981 to about 1995.
The dominant transportable storage medium in the MS-DOS era was the floppy disk, available in two physical sizes--5¼ and 3½ inches wide--using a number of partially compatible file formats. This variety of standards gave each disk a capacity from 160 kilobytes (in the first IBM PC) to 1,440 kilobytes in the last format commonly in use before floppy disks became obsolete in the late 1990s. Floppy disks were used to distribute computer software but could also contain significant quantities of text, and Michael Best has documented the commercial projects to sell Shakespeare's works in this format (Best 2007). A significant constraint was the small capacity of the floppy disk: even in the most capacious format it could hold only four or five plays so that a large authorial canon might require a set of disks.
As detailed by Best, almost all the editions of Shakespeare made available on floppy disk, and later on CD-ROM, were based on out-of-copyright Victorian editions, most commonly the Globe Shakespeare (Shakespeare 1864). An important exception was the Oxford Complete Works of Shakespeare Electronic Edition, based on the printed edition of 1986-87, which appeared in 1989 as a set of floppy disks for the IBM PC and compatible computers (Shakespeare 1989). There were 20 disks in the 5¼-inch set and 10 in the 3½-inch set and each work, such as a play, occupied one text file on a disk, encoded in the ubiquitous ASCII file format that made them usable by every program for text display, processing, and editing.
The manual accompanying the Oxford Complete Works of Shakespeare Electronic Edition explained the markup conventions used within the files, which employed the system of COCOA tags that was first developed specifically for use with the COCOA concordance software and later the Oxford Concordance Program (Russell 1967; Hockey & Martin 1987). The manual illustrated how this COCOA tagging provided information not available in the printed edition.
For example, the printed works' type-layout convention for distinguishing prose from verse put the first line of speech on the same line as the speech prefix if it was prose and on the line below the speech prefix if it was verse. While marking the start of a speech unambiguously, this convention cannot show transitions from verse to prose or vice versa occurring within a speech, which can--depending on vagaries of sentence and line length in relation to the width of printed book's measure--be impossible to detect by sight. The electronic edition eliminated this ambiguity by providing explicit markup tags for all transitions from verse to prose and vice versa.
Likewise, there is no indication in the printed edition (beyond the lines being perhaps somewhat short) of occurrences of a run of three verse lines being amphibious in the sense that either the first and second or the second and third may together form a complete metrical unit. In the digital edition this is also explicitly marked up. Most usefully of all, since the 1986-87 Oxford Complete Works of Shakespeare was groundbreaking in its theorizing and practice regarding Shakespeare's collaboration with other writers, the digital text explicitly marked changes of author within the body of a work. This pioneering digital edition gave users what must have been for most their first sight of textual markup used to convey literary-critical assertions, in this case about changes of author and versification.
The COCOA's system's relatively transparent and unobtrusive nature--Shakespeare's text is readily readable between the tags--and the edition's encoding in the universally readable ASCII format enabled anyone to make use of the extra information. This digital edition is the high watermark of openness in the commercial publication in electronic form of Shakespeare works edited to the highest modern standards, and in that aspect at least it has not yet been surpassed.
A standard floppy-disk drive is capable of writing a disk as well as reading it, so that turning a blank disk into a copy of one purchased from a publisher was cheap and easy for users to do. Indeed, the single command needed to do this was built into all personal computers' operating systems. There was nothing inherently suspicious about taking such a copy, and the Oxford Complete Works of Shakespeare Electronic Edition advised doing so for backup purposes, and its manual explained how (Shakespeare 1989, ["Manual"] 1).
From the publishers' point of view, the new CD-ROM physical format that became standard on PCs in the 1990s had one special advantage over the floppy disk as a distribution medium: it was read-only. Until the early 2000s, the CD-ROM drives in most computers could not write to disks, so copying a publisher's disks was beyond the ability of most users. A second advantage for publishers was that because each CD-ROM could hold as much data as about 400 floppy disks there was room not only for a copious text--plus images, and sound, and short video streams--but also for software.
Including software with the texts on a CD-ROM enabled publishers to disguise or even encrypt the raw texts so that instead of viewing them from the supplied disk with an interface of her own choosing the user could reach them only via a publisher-supplied software application that had to be installed on her computer. With such a CD-ROM the user was paying not just for the raw data, the texts of Shakespeare, but also the means to inspect--to read or process--that data. Indeed, with most such CD-ROMs there was no other way to get at the Shakespeare works. They could not be simply extracted from the disk because, disguised and/or encrypted, they were invisible even to the user's operating system other than as inscrutably encoded files. The only way to see the texts within was to run software provided by the publisher that undisguised and/or unencrypted them for display.
The shift from floppy disks that simply carried texts that the user could manipulate with any software she already had to CD-ROM disks whose contents could be examined only with the software provided on the disk itself was a substantial transfer of power from the user to the publisher. The appropriate analogy with old technology would be the invention of printed books that produced blank or garbled pages when photocopied and could be read only using spectacles supplied by the publisher. To pursue this analogy a little further, it was as if each publisher's spectacles worked only with one publication so that the user had to acquire as many different spectacles as she had books. The balance of power shifted slightly towards the user again when CD-ROM drives capable of writing to blank disks became cheap enough to be installed in most new computers from the early 2000s, since this at least allowed the user to make multiple copies of an expensive CD-ROM to use in different locations, such as the home and office.
Although CD-ROMs gave publishers more control over what users did with their publications than they had with floppy-disks, the very fact that these CD-ROMs, like floppies, gave the user all the data at one time in one place made it relatively trivial for advanced users to release the data from the digital enclosures the publishers put them in. The transition to predominantly online delivery of published materials in the early 2000s marked a much greater shift of power in favour of the publishers, since the user's computer need never contain all the data at one time. If only part of the data representing, say, a Shakespeare play is sent to the user's computer over the network--the part representing the specific section of the play being examined--then it becomes harder for the user to take a complete copy of all the data for one play.
The same principle is used in video streaming over the Internet, from example from YouTube, in which the user's computer receives only a few frames of the moving image at any one time--just enough to display it while the next few frames are being sent--and hence never possesses the entire recording all at once. To reconstruct the whole of the original recording, a streaming-video user has to capture the frames as they are sent and locally recombine them to replicate the complete recording they were drawn from. As we will see, one of the four digital resources examined here similarly restricts access to the raw materials it presents, allowing the user to examine only a small part of them at any one time.
The advantages, from the publisher's point of view, of online content delivery must have been readily apparent to Chadwyck-Healey, the company (later bought by ProQuest) which in the 1990s sold as standalone CD-ROM products the datasets it called "English Poetry", "English Verse Drama", "English Prose Drama", "Early English Prose Fiction", and "Editions and Adaptations of Shakespeare". Users who bought the disks could make copies of them or transfer the contents to their local hard disks (by a process called virtualization), which because they have shorter access times make the process of retrieving information from the disks faster.
Chadwyck-Healey consolidated their literature CD-ROM collections into a single online service called Literature Online (LION) in 1998, accessible only to those with institutional subscription (Chadwyck-Healey: A division of ProQuest Information and Learning 2004). With the online service, the time taken for a result to appear on the user's screen is determined not by the power and speed of the user's computer but by the power of Chadwyck-Healey's servers and the speed of the network connections between those servers and the user. Although LION does not block users from downloading the whole of a literary work, one work is the most her computer can possess at any one time. The user never posseses a full set of works as she did with the CD-ROM versions, so she cannot repurpose that full dataset for her own ends. This might seem a trivial consideration to many users, but such a transfer of the power to search the dataset from the user to the provider has severe consequences if the provider decides to reduce the range of searching options or unintentionally disables features in its own searching software.
Such unintentional disabling of features is not merely a hypothetical concern. On 28 June 2014, Chadwyck-Healey's parent company ProQuest changed the software that delivers LION to its users, inadvertantly breaking LION's proximity-searching and variant-spelling features, and in the worst way possible for investigators who rely on them. After the change, the search results returned from the website are wrong, so that for example the counts of hits are untrue, but nothing visible on the screen indicates this fault and no error message is produced. This silent disabling of LION's advanced search options brought a halt to the work of researchers who rely on these features, and at the time of writing (November 2019) the fault has not been fixed. Such things cannot happen when users rely solely on their own computers and locally attached sources of data, and keep them unchanged.
LION is not the worst example of how online delivery gives the providers of datasets far-reaching power over their users. The Henslowe-Alleyn Digitization Project took digital photographs of the collection of papers belonging to the theatre impresario Philip Henslowe and his son-in-law the actor Edward Alleyn, which are kept at Dulwich College in South London, and placed them online for free viewing. Copyright law exists to protect acts of originality and creativity, which for these documents means the originality and creativity of Henslowe, Alleyn, and the other early modern persons who contributed to the documents. Being about 400 years old, any copyrights subsisting in these documents have long since expired, but of course what the Henslowe-Alleyn Digitization Project gives its users are digital photographs of the documents, and the application of copyright laws to new media requires interpretation.
The landmark case of Bridgeman Art Library versus Corel Corporation established in 1999 that under American and British law the photographing of a flat surface containing an image or writing in order to provide the most faithful reproduction of it for viewers or readers constitutes an act of slavish copying, not originality. The United Kingdom government's Intellectual Property Office published a notice in 2015 confirming this interpretation, noting that "copyright can only subsist in subject matter that is original in the sense that it is the author's own 'intellectual creation'" and noting that it would be hard to see how anyone could claim copyright "if their aim is simply to make a faithful reproduction of an existing work" (Intellectual Property Office 2015, 3).
The Henslowe-Alleyn Digitization Project was funded by a number of private charities and directly by the people of the United Kingdom via the British Academy, which is itself funded by the government's Department for Business, Innovation and Skills. Despite being made with public money, the Project's website asserts that all the materials it provides are "copyrighted and cannot be downloaded, reproduced, copied, circulated or otherwise used" and that "The copyright of all manuscripts in the Henslowe-Alleyn Papers belongs to the Governors of Dulwich College" (Ioppolo 2005-, "Copyrights"). Neither claim appears to be true under British law. The habit of treating the possession of a document as if this conferred copyright--which as the Berne Convention makes clear arises from originality and creativity not ownership--is deeply and harmfully ingrained in the culture of museums, libraries and archives.
In the case of the Henslowe-Alleyn Digitization Project this culture of institutional irredentism has practical ramifications because the project chose to provide access to the digital photographs using proprietary software (Zoomify and Adobe Flash) that prevents the user's computer from receiving the whole picture at once. Instead, the user is given only a small movable window that reveals part of the document at a time. The Henslowe-Alleyn Digitization Project chose a window 630 pixels wide and 450 pixels deep, which at the time of writing (late 2019) is about one-sixth of the typical computer's screen size. The specious assertion of copyright in this case goes hand-in-hand with a practical, intentional impediment of the user's freedom to exploit the materials as she would wish.
The Henslowe-Alleyn Digitization Project's adoption of proprietary technology limits the project's longevity. At the time of writing, most web-browsers need the user to enable special settings in order to allow viewing of materials encoded using Adobe Flash, which for this project is essential to display the manuscripts. This is because the Adobe Flash software is so poorly written that it provides an easy route for malicious software to infiltrate and take over its user's computer. At the time of writing, the developers of the Henslowe-Alleyn Digitization Project's are undertaking a "major overhaul" of the service in order to remove its dependence on Adobe Flash (Callaghan 2019); without this overhaul the project would entirely disappear from view in late 2020 when all the major web-browser manufacturers intend to stop supporting Adobe Flash.
The Database of Early English Playbooks (DEEP) contrasts with the Henslowe-Alleyn Digitization Project in almost every particular except that it too was built with public money by academic subject specialists and is free to use (Farmer & Lesser 2007-). Where the Henslowe-Alleyn Digitization Project asserts its creators' copyrights, DEEP makes its contents available under a Creative Commons Attribution Non-Commercial Share-Alike licence. Where the Henslowe-Alleyn Digitization Project explicitly forbids downloading the project's underlying data, DEEP explicitly encourages it by putting a "Download DEEP Data" link on its homepage, which leads to a page that offers the project's entire contents in HTML, Comma-Separated Values (CSV), and XML form.
Nothing in the design of the DEEP website is intended to limit the user's ability to work with the data, as the Henslowe-Alleyn Digitization Project does with its small window upon its images. Nothing in the DEEP website relies upon proprietary software, the main search functionality being provided by the language JavaScript, which conforms to an International Standards Organization (ISO) standard for scripting languages. Built in this way on open standards, DEEP has an excellent chance of remaining in good working order with minimal maintenance for many years to come.
[In response to this evaluation of four digital products for Shakespeare scholars, the longer essay from which this paper is derived concludes with advocacy of Open Source, Open Data, and Open Standards principles in academic research.]
Works Cited
Best, Michael. 2007. "Shakespeare and the Electronic Age." Shakespeare and the Text. Edited by Andrew Murphy. Concise Companions to Literature and Culture. Oxford. Blackwell. 145-61.
Callaghan, Samantha. 2019. 'Henslowe-Alleyn Digitization Project': Personal Email Correspondence to the Author, 25 November.
Chadwyck-Healey: A division of ProQuest Information and Learning. 2004. Literature Online Third Edition: A Full-text Subscription-only Database of English and American Literature Delivered Over the Internet from Http//lion.chadwyck.co.uk:.
Farmer, Alan B. and Zachary Lesser. 2007-. Database of Early English Playbooks (DEEP): A Database of Bibliographical Information Delivered Over the Internet at Http//deep.sas.upenn.edu:.
Hockey, Susan and J. Martin. 1987. "The Oxford Concordance Program Version 2." Literary and Linguistic Computing 2. 125-31.
Intellectual Property Office. 2015. Digital Images, Photographs and the Internet: Copyright Notice Number 1/2014. Newport, Wales. Intellectual Property Office, an operating name of the United Kingdom Patent Office.
Ioppolo, Grace. 2005-. Henslowe-Alleyn Digitization Project. Online.
Russell, D. B. 1967. 'COCOA' Manual: A Word Count and Concordance Generator for Atlas. Didcot. Atlas Computer Laboratory.
Shakespeare, William. 1864. Works [The Globe Edition]. Ed. William George Clark and William Aldis Wright. London. Macmillan.
Shakespeare, William. 1989. The Complete Works. Ed. Stanley Wells, Gary Taylor, John Jowett, and William Montgomery. Electronic edition prepared by William Montgomery and Lou Burnard. Oxford. Oxford University Press.