Picture of Gabriel Egan G a b r i e l   E g a n  .  com

"What happens when you abandon paper: Scholarly communication in the digital world" by Gabriel Egan

[SLIDE] 1. Paperlessness, or What have computers ever done for us?

Predictions made between 20 and 10 years ago about what computers would do to textuality have turned out to be entirely wrong. Hypertext, it turns out, is not quite the big deal we thought it would be. The focus on hypertext--indeed the conflation of the new etext technologies with hypertext--was much driven by George P. Landow's landmark publication Hypertext: The Convergence of Contemporary Critical Theory and Technology in 1992 (Landow 1992). This book took the lightest aspects of high French literary theory and observed that electronic hypertext seems to make real what Roland Barthes described as the ideal state of writing, and that it seems to provide intensified intellectual experiences that find echoes in the works of Jacques Derrida and Michel Foucault. Landow himself traced hypertext back to the essay "As we may think" by Vannevar Bush, which described an imaginary machine (the memex) for recording one's researches through academic books and which many have claimed marked a new direction in theories of how knowledge is organized. This is Bush [SLIDE]:

Our ineptitude in getting at the record is largely caused by the artificiality of systems of indexing. When data of any sort are placed in storage, they are filed alphabetically or numerically, and information is found (when it is) by tracing it down from subclass to subclass. . . . The human mind does not work that way. It operates by association. With one item in its grasp, it snaps instantly to the next that is suggested by the association of thoughts, in accordance with some intricate web of trails carried by the cells of the brain. (Bush 1945, 106)

Bush's memex was a mechanical means of recording the conceptual trails that one's mind creates between items one had read and Bush has been hailed as the inventor of hypertext, itself supposed to be a new form of textuality.

This, I suggest, is a mistake. Writing in 1992 Landow experienced hypertext primarily in the form of the Apple Macintosh application HyperCard, and HyperCard stacks (the trails of connected items) were for the most part written by intelligent people whose brains were worth crawling through. Since then, the proliferation of hypertextually linked documents on the Worldwide Web has proved beyond any doubt that most brains are not worth crawling through and that a significant minority of them are highly objectionable. The problem started, I suggest, with Bush's apparent rejection of indexing in favour of associative linking of disparate materials. As Jim Whitehead pointed out (Whitehead 2000), Bush's notion of an associative link between documents was vague and its implementation in hypertext systems is usually simplistic: the 'head' of a link appears in one document (and is indicated to the user by a visual feature such as underlining) and it leads to the 'tail' located in another, reached by selecting the 'head'. This impoverished notion of association between documents was simple to implement in HyperText Transfer Protocol (HTTP) and HyperText Markup Language (HTML) and does not do justice to Bush's subtle, but vaguely defined, sense of joining two documents. [BLANK SLIDE]

The big difference that computers make is not that they enable us to make hypertexts: we've been able to do that for centuries using books inscribed or printed on paper. That's why we are so fussy about how students write footnotes: we're teaching them to make their writing into hypertext -- it must be possible to follow each citation of authority to find that authority and check that it says what the student says it says. That is, scholarly writing must have no broken links. All that computers do is make the hyperlinks more easily traversed: instead of each new text being fetched from the library stack, it comes to the reader over the Internet. Hypertexts are not what computers contribute to the next digital-textual economy. Rather, it is computers' ability to make thousands or milliions of perfectly identical copies and to store them in very small spaces.

It is now a trivial matter to store all the books one has ever read or owned in a laptop -- this one here has my entire book collection. That means that all my books are with me at all times, as are all the emails I've ever written, all the talks I've given, all the notes I've ever made on anything, and all the books and articles I've published. All of this writing is full-text searchable so that I do not need to remember where something was written, only what it was about. If when writing something new I recall that the philosopher Martin Heidegger once said something interesting about hammers, I only have to search all my files for the term Heidegger appearing within say 10 words of the term hammer to find the appropriate part of Being and Time; in a few seconds I have the answer. This machine is a prosthetic memory and I simply do no need to remember in detail my views on particular topics: I can always use the machine to recover exactly what my position was when I looked carefully at that topic.

Questions & Answers for 10 minutes. If no questions, pose to the group:

i) would anyone case to challenge my dismissal of hypertext as the computer's contribution to textuality?

ii) can anyone see a downside to putting all one's materials into digital form?

[SLIDE] 2. Being social: The public and the academic

The Internet brings people together and around 40% of all humans have Internet access. This extraordinary fact gives us a new kind of communicative space that enables collaborative work between persons who ordinarily would have nothing to do with academic work. One kind of socialized academic activity is crowdsourcing: getting people you don't know and might never meet to do work on your project simply be inviting them and making it attractive to do so. [SLIDE] Examples of this first kind of social engagement include Melissa Terras's Transcribe Bentham project at University College London, in which the unpublished correspondence of the utilitarian philosopher Jeremy Bentham is transcribed by hundreds of volunteers who are shown images of the letters on their Internet browers and who type in what they think they see. [SLIDE] Similar success has been had by the Zooniverse project which has hundreds of thousands of volunteers helping to process large collections of scientific data by, for example, identifying features in pictures from space telescopes, and on a smaller scale Paul Flemons's project at the Australian Museum in Sydney that that has volunteers labelling insect specimens by transcribing the accompanying handwritten labels displayed to them in digital pictures. [BLANK SLIDE]

This new communicative space raises several concerns that are both ethical and a practical. Is it right to have things done for free by the general public that in the past someone would have been paid to do? The crowd is quite literally putting junior laboratory technicians out of a job, removing one of the lowest rungs in the scientific career ladder. On the other hand, as the etymology of the word amateur reminds us, people do this kind of work for free out of a love of the subject. Of course, certain groups like graduate students are in a vulnerable position and we ought to be particular thoughtful about their motivations when they become predominant in the partcular crowd that is doing the work.

Just what is it that the amateur transcribers and correctors do differently from the professionals? One difference lies in the self-reflexity of the professional: we think about our methods, conceiving of them in theoretical terms. If we want amateurs to get self-reflective we have to start with providing them with some guidelines and these ought to appear where the amateurs go looking for knowledge, which is in Wikipedia. But do we expect or even want thousands of members of the public to learn these skills to help us out? Indeed, there is an argument that although thousands of people will join a successful academic crowdsourcing project, the real value is not in using all of them but in finding the few truly obsessed amateurs who will contribute vast quantities of high-quality labour for free. Most big academic crowdsourcing projects find that many thousands of people are willing to do a very small quantity of labour each--and those multiples produce a large quantity of labour overall--but that typically one or two dozen people will contribute far more than everyone else, amounting to perhaps a quarter or a third of all the labour donated. The crowdsourcing process becomes, then, a kind of open audition, a way of finding that needle in the haystack that is the well-qualified amateur willing to make a vast and free contribution.

There is a lively debate within academic crowdsourcing about just how much we should trust the work done by non-professionals. The idea that there is wisdom in crowds comes from studies in what happens when you average the judgments of many people. The often-used example is of the fairground game of guessing the weight of the pig or the number of pennies in a large jar. For these examples, if we simply take the numerical guesses of a thousand people and calculate the average guess, it will typically be extremely close to the true weight of the pig or number of pennies in the jar, and (most interestingly) closer to the true number than any particular expert such as a pig farmer might get. When James Surowiecki first popularized this phenomenon in his book The Wisdom of Crowds in 2004 it was widely pointed out by reviwers that many academic problems are not amenable to this approach. You cannot build an encyclopedia that way, they said, since you need individual experts to write one article each rather than averaging the the opinions of many experts [SLIDE]. This seemed like a convincing argument about the limitations to crowdsourcing at the time, althought Jimmy Wales and Larry Sanger had launched Wikipedia in 2001 and by 2004 were demonstrating that in fact the public at large could build the world's greatest encyclopedia. (This is the first surviving archive of Wikipedia from March 2001 when the site boasted over 3,000 pages; today it's over 14 million.)

Here I'm treating crowdsourcing as a phenomenon spurred by the invention of the Internet, but in fact we should remember that one of the cornerstones of our discipline, [SLIDE] the New English Dictionary that was later renamed the Oxford English Dictionary was a nineteenth-century crowdsourcing project of extraordinary success. The NED's second most prolific contributors was a murderer jailed at the Broadmoor Criminal Lunatic Asylum (Murray 1979, 305-07) who contributed tens of thousands of historical illustrative quotations from books in his own possession.

There have many scholarly projects in the Humanities that used not a crowd of the general public but a crowd of scholars to produce what no scholar could produce on her own. In the field of theatre history, the Records of Early English Drama project by Toronto University has sent scholars into the local parish and county records offices across the British Isles to transcribe from corporation and borough records any mention  they could find of theatrical activities outside of London in the early modern period. Having run now for forty years with dozens of contributing academics, Records of Early English Drama has shown that a slow project involving many people over a long period of time can produce results that are simply unachievable by individual scholars or even small teams working within the usual funding time-frame of two to five years.

Questions & Answers for 10 minutes. If no questions, pose to the group:

i) does anyone see either practical or ethical reasons to be cautious about crowdsourcing scholarly activities?

ii) would people be willing to rely on a digital surrogate of a literary or historical document that had been created by crowdsourcing of the transcription and editing of the original?

[SLIDE] 3. Open Access, the Humanities scholar and the ethical need for 'piracy'

A new volume in the series [SLIDE] Records of Early English Drama every couple of years, covering the theatrical records of a fresh town or county, and these are sold as large volume hardback costing between one and two hundred pounds each, which are only bought by a research libraries. Extraordinarily, however, one can also get the PDF of each volume for free from the Internet Archive website, and these PDF versions are much more useful than the printed one since they are full-text searchable: rather than relying on the author having included the term you want in her index, you can search for any word appearing in the volume. How can this be, that a free version appears online simultaneously with the printed version? It turns out that 40 years ago when negotiating the publication of the first volumes, the project's co-founder Alan Somerset added a peculiar clause to the contract, which the press accepted with simple bemusement because they had no idea what he was talking about.

Somerset got the publisher to agree that the rights for digital reproduction of the contents of each volume were to remain with him personally. No-one in publishing in the 1970s had any idea was digital publication meant--no-one but a few large organizations even owned a digital computer that might hold a digital edition--so they saw no danger in giving Somerset sole personal rights to these materials. Somerset then had the revolutionary idea of just giving these editions away over the Internet. Moreover, because he can do what he likes with the digital contents, Somerset has been able to rework them as an ongoing online database of actors, their patrons, their places of performance, and their plays. This database is provided for free over the Internet and has made possible the kinds of theatre-history writing, especially the writing of company histories, that was simply impossible before full and accurate details of touring activities were available. There is currently a boom in the publication of these histories, and in a large part it can be attributed to Somerset's innovative approach to publishing negotiated 40 years ago. [BLANK SLIDE]

This seems to me a good example of the transformative power of Open Access digital publication, and I want for a moment to contrast it with traditional print publication. Put crudely, the print publishers' economic model was founded on two bases: i) the accumulation of capital in the form of expensive printing presses and distribution networks, and ii) the possession of exclusive rights to reproduce certain content. Even in the early days of the London printing industry, in the late-sixteenth century before our modern notions of copyright came into being, the Stationers' Company existed to protect the rights of exclusivity of publishers. A Marxist like me would of course say this, technology has been the driver in these historical processes, and that the associated ideas--especially such notions as copyright--arose after technological change in order to try to accommodate the new technology's impact within the wider economy. In Marxist terms, copyright is a superstructural form that emerges from the economic structure. I shall return shortly to this point in order to argue that we ought not to feel ourselves morally bound to the existing principles of copyright

I'd like first to say a bit more about the economics of current Humanities book publishing. Academics, whose salaries are in most cases paid by the state, produce knowledge and write it up in articles, essays, and books. Traditionally that have given these works free of charge to publishers, who (controlling the means of knowledge distribution) disseminate this knowledge through the world in the form of printings that are sold on the open market. For most research monographs, these printings are bought by a very few individuals and by the university libraries of the world who store them in vast collections. What distinguishes the most prestigious and useful research libraries is the completeness of their collections: one goes to the Bodleian in Oxford or the Library of Congress in Washington or the Huntington in Los Angeles in the hope that wherever one's reading takes one--whichever footnote one wishes to follow up--there will be a copy of the work in that library that can be fetched in minutes. That is what I meant earlier about the library being a hypertext: only in a big library are all the links unbroken.

You do not have to be a Marxist to see that this model of knowledge dissemination--in which people travel to visit one of the many identical copies of a book that are stored in the libraries of the world--is peculiarly archaic. It is not only strange, it is unsustainable in terms of sheer numbers of books sold. I confess here that my knowledge of books is for the most part limited to my field, Shakespeare studies, and my knowledge of how book authoring relates to the career development of academics is largely limited to the British university system. However, at the 2007 meeting of the Renaissance Society of America in Miami, Florida, I took a look at the research monographs on sale in the book displays at the conference. Sampling at random, I made quick counts of the numbers of people thanked in the acknowledgements sections of various books. I looked at 15 books (an admittedly small sample) and the number of people personally thanked ranged from 20 to 80, with an average of 42. That's a lot of people, and in quite a lot of case the higher end of that scale, 80, comes close the total world sales for a new research monograph in our fields.

In other words, as a means of disseminating one's research outcomes to a group of interested fellow researchers, the print monograph fundamentally fails. Rather than publish a book, an author would be better off going around to each of the people she mentions in her acknowledgements and simply telling them her findings. She would, by that means, in some cases reach more people than buy the book. Of course, a book bought by a library is hopefully read by more than one person, but if anyone who wants to get a sense of how often each research monograph in a library is borrowed, most library catalogue systems can supply this information. For older books there is an even simpler test. I recently had cause to read the introductions to the first volumes in the Arden Shakespeare's first series of play-texts, published from 1899 to 1905. I used the copies in the specialist Shakespeare Institute research library in Stratford-upon-Avon, where one would expect the usage of these books to be considerable. In fact in several cases I had to borrow the librarian's book-knife to cut open the folded edges of the sheets of paper. Having lain on the shelf for 100 years, the introductions to these books were unread until I looked at them.

To return to my main theme, from the point of view of disseminating knowledge the printed research monograph does not work at all well. Another reason to reject this means of scholarly communication is that it is based on a decidedly unfair economic model. Why should universities give their research to publishers only to have those publishers sell it back to them? It remains to be seen whether publishers can retain control of journal-article dissemination in the face of the currrent drive to Open Access publication. Journal publication is a highly lucrative market, so they will try. On the moral issue, though, the case is unanswerable: since knowledge is generated in the universities and we have the technical means to preserve it and to disseminate it, we ought to simply give away our work via Institutional Repositories. We are already effectively giving it away to publishers, and it is hard to see why we still do so now that the means of production and distribution have been radical overhauled by technology.

What should an academic do about all this? My answer is that, as professionals morally charged with the maintenance and dissemination of the literary part of our cultural heritage, we should push as hard as we can against all the limitations to dissemination of academic works that the notion of copyright, which is our enemy, erects. That includes undertaking copying that has in the past been called 'piracy'. That is, we should wherever possible digitize resources that we use and share them, and also share digital resources that we have purchased, and all without regard for copyright. It is no exaggeration to say that the new media are fundamentally altering the nature of property within late industrial capitalism, and that old notions of ownership simply do not apply in the new situations. If this sounds like reckless talk, it is worth noting that no-one in academia has ever been prosecuted for breaking the old licensing rules using the new media, and I suggest that we ought not allow ourselves to be cowed by legal opinions (for which our employers pay a lot of money) that inhibit our copying of the materials that we use in teaching and research. The very impermanence of online resources puts us under a moral obligation to pirate as much as possible, because we cannot rely on the materials surviving any other way.

[SLIDE] To see why not, take the example of the British Broadcasting Corporation's splendid LaserDisc project in the 1980s, which aimed to create a new digital Domesday book recording life in the United Kingdom 900 years after the first Domesday Book. The resources assembled for this project are effectively lost to us all because as a standard for dissemination the LaserDisc and its associated home computer, the Acorn/BBC micro, are incompatible with the standard computer systems in use today. If piracy of materials from the project had been widespread--that is, if users had possessed the technical means to violate their licence conditions by copying what they wanted--most or all of the raw material of the project would be available to us in some form.

This is not wishful thinking on my part: we have a clear precedent for it. As is well known, the BBC routinely wiped and reused audio and video tapes of radio and television programmes from the 1950s and 1960s, and in many cases the only surviving copies are illegal pirated recordings made off-the-air by listeners and viewers and stored at home. The BBC is now grateful to receive copies of these illegal recordings to fill the extensive gaps in its broadcasting archive. On a personal level, I'm sure I'm not the only person whose list of publications includes an article commissioned for an academic website that no longer exists. In my case, the I only hope that (contrary to the terms of use published on the site) people did copy material from the Arden Shakespeare's now defunct ArdenNet website, else I'm the sole possessor of an text that was once widely available and that has been cited in more than one printed book. [BLANK SLIDE]

In a world in which Google is routinely scanning books without their authors' permission and in which universities are seeking to put publishers out of business and make themselves repositories of knowledge in electronic form and in which large public institutions have shown themselves to be unreliable custodians of data, it would be an absurdly self-denying gesture for academics, the source of all this knowledge, to pause before copying materials and ponder the copyright position of their acts.

I do not suppose that I will be able to convince many people to simply stop worrying about copyright. I do, however, see some straws in the wind that make me optimistic. Insitutional Repositories are one such straw, and extensive self-archiving by academics is another. A few years ago I started to put on my personal website at gabrielegan.com copies of everything I had published. When I first delivered an earlier version of this talk some years ago, I said that on my personal website I give away free copies of everything I've published but I added the caveat "with the exception of the very latest book, chapter, or article about which a publisher might complain that I was hurting their business" and I explained that "like everyone else I cannot, in this transitionary phase, afford to alienate publishers since my career progression depends on them". That was six years ago when I still sought career progression and had to keep publishers sweet -- now I have a chair I can afford to be more ambitious, and now I have put literally everything I've ever published available for free on my website, including my latest book that Cambridge University Press is actively trying to sell for £61 per hardback copy. My feeling is that any losses I might incur in lost royalties are likely to be made up for by increased readership, and my primary motivation in publishing my books is not income but readership so I'd gladly lose all my royalties to gain readers.

The longer I think about it the less moral justification I can see in claiming royalties on my publications. All the research I have ever done has been paid for by the taxpayers of the countries I've worked in, so in a very real sense my publications do not really belong to me at all. Perhaps I have the moral right to be identiied as their originator, but the work really belongs to the people who paid for for it. This I realize is a controversial position, so I'll end by asking if anyone wants to make the case that academics should really feel that they own the publications that they write as employees of the state. If so, would you extend that right of ownership to state employees other than academics? Should tax inspectors personally own the reports they write about particular companies and be able to license them to commercial publishers who then sell them back to the tax inspectors' employers, Her Majesties Customs an Revenue? If that seems like an absurd proposition, what is that makes academics different?

Questions & Answers for 10 minutes. If no questions, pose to the group:

i) would anybody care to defend current copyright regulations?

ii) do people have reservations about any aspect of Open Access?


Works Cited

Bush, Vannevar. 1945. "As we May Think." Atlantic Monthly 176. 101-08.

Landow, George P. 1992. Hypertext: The Convergence of Contemporary Critical Theory and Technology. Johns Hopkins University Press. Baltimore.

Murray, K. M. Elizabeth. 1979. Caught in the Web of Words: James Murray and the Oxford English Dictionary. Oxford. Oxford University Press.

Whitehead, Jim. 2000. "As we Do Write: Hyper-terms for Hypertext." ACM SIGWEB Newsletter: The Association for Computing Machinery Special Interest Group on Hypertext, Hypermedia and the Web 9.2-3. 8-18.