Picture of Gabriel Egan G a b r i e l   E g a n  .  com

"Privacy, Preservation, and Genetic Criticism" by Gabriel Egan

I imagine that we all know what the 'save' button in a software program such as Microsoft Word does when we press it. It copies the transient, ephemeral words on the screen--that would disappear if the electricity to the computer were disconnected--to a long-term storage device, such as a hard-disk, giving it a document name by which we may recall it at a later date. Until we do that, the document is not 'saved'.  Actually, no.

<< Here demonstrate Word's saving of your document to your hard disk without asking you permission:

* Run Word

* Type a famous quotation

* Wait one minute. During which, say "Microsoft Word is, by a long way, the most widely used word-processing software in the world, dominating the market for software that you install onto your own hard disk and use to edit files on your own hard disk. Google Docs has had some success as the favourite word-processor for cloud computing in which the software itself and your files are stored on Google's computer instead of your own and are accessed remotely via your web-browser, but this is only a tiny proportion of the word-processing market." 

* Kill Word using Task Manager

* Navigate to C:/Users/Administrator/AppData/Roaming/Microsoft/Word (Administrator is my username, on your computer this will be your username)

* Drop "AutoRecovery save of Document1.asd" into Notepad

* Show that the famous quotation is there, even though we never touched the 'Save' button >>

Microsoft Word, then, stores copies of your documents on your hard disk without you explicitly 'saving' it. Word also stores information about your editing of a document, keeping a log of your changes as you edit a document, which you probably suspected since you are able to Undo and Redo your most recent edits in chronological order.

<< Demo this by typing:

Errour made first, corrected third

Errour made second, corrected second

Errour made third, corrected first

Correct these errors, doing the last line, then the one above, and then the one above

CTRL-Z undoes the error in the top line (third correction), CTRL-Z again undoes the error in the middle line (corrected second), and CTRL-Z again undoes the error in the bottom line (corrected first).

But you can't, using Undo, have the middle error uncorrected and the other two corrected: it's a linear reversal of time >>

Microsoft Word's Undo feature is much admired for its ability to save us from our typing and editing errors, but of course it does not persist between sessions. That is, if I 'save' this document and re-open it, the Undo feature will not allow me to undo the previous corrections: these are deliberately destroyed at the point of 'saving'. Imagine how bad it would be if that were not the case: someone you give a document to would have a record of the edits that you made before explicitly saving the document. They could revert your document to a previous state of editing to see how you first phrased everything before revising the text to improve it. You might well think that software that secretly did this was betraying your trust by revealing to someone else the genetic process by which your final, saved version came into being.

Remarkably, for several years Microsoft Word did exactly that. Word 97 was on sale from mid-1996 to early 2003 and it retained in the saved version of each document a revision history showing what had been inserted, deleted, and moved, along with the usernames--as recorded by the  Microsoft Windows operating system when it was installed--of those who had edited the document, and also the various filenames under which they had 'saved' it (Wilding 2006, 61-63). In 2003 the British government published on one of its websites a Word-format document that the government claimed was a top-level intelligence report called "Iraq: Its Infrastructure of Concealment, Deception and Intimidation". Analysis of the revision history, by those who knew how to get at it, soon showed that in fact the document was put together by publicity administrators working for Alastair Campbell, the Prime Minister's press secretary. This raised suspicions about the origins of the report, and further investigations showed that its contents were almost entirely copied and pasted from Internet sources with minor rewordings and exaggerations (Rangwala 2003). This document became known as the infamous Dodgy Dossier.

These two features of the world's most popular word-processing software--'saving' copies of your document to your hard-filing system without telling you, and recording the history of your edits--are present in the software because they achieve effects that users greatly appreciate. We are all grateful that when our computers go wrong, as they frequently do, and have to be forcibly restarted, we have a good chance of recovering documents that we had not yet explicitly saved. That is the purpose of Microsoft Word's AutoSave feature I started with. And we are all grateful that when we accidentally delete a large amount of text, or alter it in an undesirable way, we can undo that mistake and revert to a previous state of our text. In thinking about these features, we should remember that that arise from a fundamental difference between today's text technologies and all previous, non-digital ones. That difference is the notion of a 'save': the copying of the impermanent text stored in impermanent memory to a permanent medium such as a hard disk.

No previous text technologiy imposed on authors such as sharp distinction between the impermanent writing-in-progress and the permanent 'saved' version. With chisels inscribing marks on stone, or quills spreading ink lines upon parchment or paper, or metal type impressing ink into paper, there was nothing quite like the 'unsaved' version of a document in a word-processor. One possible analogy would be how people generally complete crossword puzzles by cautiously writing first in erasable pencil and then overwriting in ink once they become sure they have the correct word. Equally, I suppose (but have not checked) that stonemasons sketch out their intended cuts as pencil marks on a stone surface before picking up the chisel and mallet. An alternative analogy would be to consider that the version of a writer's work that is disseminated to a large number of readers by publication is like the 'saved' version and that all prior writing is like the ephemeral 'unsaved' text, subject to second thoughts and revisions.

We know that authors frequently tinker with their manuscripts and typescripts before these go to the publisher, but we should also bear in mind that publishers and their printers alter the writing too, intentionally and unintentionally. Where there is unintentional alteration, for example by mistakes made in the setting of type for printing, we might easily assume that in checking their proof printings authors would revert such errors back to the correct readings by using their manuscripts or typescripts as authorities. That is, the text would undergo linear reversion back to a prior state. << Demo the linear Undo in Word again >>. In fact, however, it seems writers such as Charles Dickens and James Joyce were apt to build upon the errors in their proofs rather than simply correct them. That is, they would accept and adopt a printer's error and mark in their proofs some further development based upon it, rather than simply returning to the form that they originally wrote (Gaskell 1978, 142-55; Taylor 1983, 401-02; Joyce 1986, U.28-9, U15.2728-9, U15.3119, U15.3128, U15.3837). Here my analogy with the an 'undo' revision history breaks down, because we have not a linear sequence of change but rather a graph of diverging paths, something more like what is called in computational text editing an 'undo' tree and in textual bibliography a stemma.

In everyday wordprocessing, there is no visible difference between the tentative, unsaved version of our text and the finalized, saved version: visibly and indeed functionally they are equivalent. Where then does this notion of 'saving' a text come from, since it has no counterpart in the long pre-computer history of text technology? It comes from the fundamental design of all modern computers, the so-called von Neumann architecture established in 1945, that provides the central processor with a small amount of fast storage (memory) whose contents are lost when the electrical power is switched off and a large amount of slow storage that retains its contents without electrical power. In managing the competing demands of modern software, today's operating systems also do what Microsoft Word does in temporarily comitting to permanent storage information that the user has not explicitly 'saved'. When things go wrong and these permanent records are not destroyed, transient information is permanently stored on our computers. These fragments of information cannot easily be recovered by users, but scientists of Digital Forensic can recover them by bypassing the operating system and searching your hard-disk at the lowest hardware level..

* * *

What are the consequences of this fact for what we do in the academic study of writing? For literary archives this fact gives us an extraordinary opportunity to study the means by which a piece of writing came into existence. Because computers store much more than their users intended them to store, we can do Genetic Criticism using Digital Forensics, as pioneered by Matthew Kirschenbaum (Kirschenbaum 2008; Kirschenbaum & Werner 2014; Kirschenbaum 2014; Kirschenbaum 2016). In the field of academic studies of literary writing, it has been a concern for some time that the transition from longhand and typewriting to computer word-processing would rob us of the documents that we so greatly value when authors deposit their archives in public and university libraries. The rough handwritten drafts, the cancelled alternate readings in typescripts, and the annotated first proofs are physical documents that we at first thought had no counterpart in the digital world. But now we know that these things do have digital counterparts that we can recover if we go poking at the level of zeroes and ones across the surface of the author's hard-disk, because there are many things there that the author did not intentionally 'save' but got 'saved' anyway. This seems an unmitigated good until you think of it from the authors' point-of-view.

Literary authors are increasingly donating their computer systems to public and university libraries, so we can expect much more Genetic Criticism by Digital Forensics to happen in the future. But when living authors start to see what can be recovered by these means, they might start to regret leaving their computer archives open to such examination. Before depositing a paper archive, an author can at least look at every document in it to make sure that she is happy for others to see it. This doesn't always happen in practice of course, especially if the archive is large. In 2013 Germaine Greer sold 478 archive boxes of her old papers to the University of Melbourne, and in them has been discovered a long love-letter to Martin Amis written in 1977 that Greer does not want published (Simons 2015). Apparently the University of Melbourne would be within its rights to publish the 30,000-word letter despite Greer's objections, but so far it has chosen not to do, although it has allowed the researcher who found the letter to publish extracts in a journal article.

The problems about confidentiality that writers' paper archives present are trivial compared to those presented by digital archives, because as I have tried to sketch here there is much in a digital archive than the writers will simply not know is there. If the facts about what computers do in making permanent records of our words without telling us become widely known it may be that writers may become most reluctant to donate their digital archives to public and university archives. This would be a grave loss to literary culture. On the other hand, we cannot proceed by simply ignoring the surreptitious behaviour of our computers. What then should we do?

Individual users need to become more aware of what the software they are using is doing so that they can control it; the feature of Microsoft Word that I started with is easy to switch off. But this is of little use when the entire operating system leaves fragmented copies of our documents around the hard disk. That problem applies equally to Microsoft Windows, Apple OS X, and almost all varieties of the free and open-source Linux operating system. I say almost all varieties because one variant called Tails, The Amnesic Incognito Live System, is a variety of Linux that assiduously preserves privacy and anonymity, leaving no digital footprint on the computer it is run on. Tails is the favoured operating system of technically literate journalists such as those who helped the whistleblower Edward Snowden, but it is rather difficult for the average user to adopt. The problems I have been describing are merely local manifestations of the more general problem that computer software made by corporations records much more about us than we want it to. The only solutions to that general problem are political, not technical. But if we can solve those political problems, we can regain control of our digital archives and make them as complete or a selective as we want them to be.

Works Cited

Gaskell, Philip. 1978. From Writer to Reader: Studies in Editorial Method. Oxford. Clarendon Press.

Joyce, James. 1986. Ulysses: The Corrected Text. Ed. Hans Walter Gabler. London. Bodley Head.

Kirschenbaum, Matthew G. 2008. Mechanisms: New Media and the Forensic Imagination. Cambridge MA. Massachusetts Institute of Technology Press.

Kirschenbaum, Matthew G. 2016. Track Changes: A Literary History of Word Processing. Cambridge MA. Harvard University Press.

Kirschenbaum, Matthew and Sarah Werner. 2014. "Digital Scholarship and Digital Studies: The State of the Discipline." 4. Book History 17. 406-58.

Kirschenbaum, Matthew. 2014. "Operating Systems of the Mind: Bibliography After Word Processing (the Example of Updike)." Papers of the Bibliographical Society of America 108. 381-412.

Rangwala, Glen. 2003. 'Intelligence? The British Dossier on Iraq's Security Infrastructure': A Posting to the Email Discussion List of the 'Campaign Against Sanctions on Iraq' (CASI) on 5 February.

Simons, Margaret. 2015. "The Long Letter to a Short Love, or . . .." Meanjin Quarterly (Online Edition) 74.4. n. pag..

Taylor, Gary. 1983. "King Lear: The Date and Authorship of the Folio Version." The Division of the Kingdoms: Shakespeare's Two Versions of King Lear. Edited by Gary Taylor and Michael Warren. Oxford Shakespeare Studies. Oxford. Clarendon. 351-468.

Wilding, Edward. 2006. Information Risk and Security: Preventing and Investigating Workplace Computer Crime. Aldershot. Gower.