Untitled Document

"Where are we now in determining Folio compositor stints?" by Gabriel Egan

The work of setting the 899 pages of type in the 1623 Folio did not fall to one compositor. Thomas Satchell was the first to notice that in Folio Macbeth each of 35 words is spelt one way in the first half of the play (for example, doe and goe) and another in the second half (do and go), and he wondered if this was because each of two compositors imposed his own spelling preferences as he worked (Satchell 1920). Satchell was unable, however, to eliminate the alternative possibility that Folio Macbeth was set from two manuscripts in which these spellings differed. Edwin Eliott Willoughby took the most discriminating five of Satchell's 35 words, added one of his own, and extended the search to Folio plays beyond Macbeth (Willoughby 1932, 54-60) to establish that at least two compositors, A and B, set the Folio, and probably two more as well. Willoughby's study was avowedly incomplete.

In the first decade of its publication, the journal Studies in Bibliography published a series of articles studying the compositors' habits in the Shakespeare quartos and Folio. Among the most important were Alice Walker's claim that Folio compositor B was sloppier in his work than compositor A (Walker 1954)--useful knowledge for an editor deciding whether to emend a suspect reading--and Charlton Hinman's discovery of an apprentice compositor E. Hinman explained that he used "E, rather than C or D, because not all of the material before the Tragedies was set by A and B, and C and D may later be required to designate compositors in the Comedies" (Hinman 1957, 4n2). Compositors C and D were duly discovered by Hinman, who developed a new technique for attribution (Hinman 1963a; Hinman 1963b). Hinman noticed that distinctively damaged pieces of type could be traced across the Folio, and on the assumption that compositors did not share type this enabled him to trace the pages set from a particular typecase (containing one or more damaged pieces) across the book. Because compositors did not share type, Hinman decided, tracing typecases gave an indirect means of tracing compositor stints on pages where spelling evidence is not decisive.

A decade after Hinman's landmark book on the Folio, T. H. Howard-Hill added compositor F to the roster by showing that compositor A of the comedies and compositor A of the histories behave in distinct ways (Howard-Hill 1973). Hinman had been wrong to assume that compositors did not share type, and had in any case used too few spelling differences in making his stint attributions. Rather than relying solely on spelling, Howard-Hill applied what he called "psychomechanical" tests concerning habits of layout, punctuation, and spacing to distinguish compositors. In particular, whether or not a space is inserted after a comma and before the next word seemed to Howard-Hill an especially strong marker of compositor identity. Gary Taylor approved, thinking Howard-Hill's test for "frequent terminal spaced commas . . . a near-infallible indicator of [compositor] C's presence" (Taylor 1981, 97). On this basis, Taylor subdivided still further the established stints, adding compositors H, I, and J to the roster.

These foundational works on Folio compositor identification have been subject to little independent verification. John O'Connor verified and built upon Howard-Hill's work on compositor F and Paul Werstine showed that Walker's characterization of compositor B as sloppy was mistaken--the deficiencies she noticed were localized and due to a casting off error--and that the differentiation of compositors D and F by spelling habits is unreliable once we take into account the influence upon compositor D of setting from manuscripts containing Ralph Crane's distinctive spellings (O'Connor 1975; Werstine 1978; Werstine 2001). More fundamentally, D. F. McKenzie showed that assumptions of compositorial consistency in spelling and psychomechanical habits are flawed. From an early eighteenth-century book for which there is independent evidence of the compositors' stints, McKenzie was able to show that a compositor might vary in his comma-spacing habit from day to day and hence tests relying on this habit are unreliable (McKenzie 1984). To be fair, it should be noted that McKenzie concerned himself with spaces before commas whereas Howard-Hill's test was for spaces after commas. This may not be a trivial distinction. Every careful writer of English today will put a space after a colon but those trained in Machine Readable Cataloguing (MARC) of books--and only those people--will put a space before a colon too, because computerized cataloguing systems demand it. Looking for spaces before colons is a valid way to distinguish today's librarians from non-librarians, but looking for spaces after colons tells us nothing.

Peter W. M. Blayney's new introduction for the second edition of Hinman's Folio facsimile gave credence to the existence of all eight of the currently claimed compositors (A, B, C, D, F, H, I. and J) and even countenanced Taylor's suspicion that behind H there might be two men (Shakespeare 1996, xxxiv-xxxvii). Eric Rasmussen and Anthony James West's new catalogue of Folio exemplars gives no opinion on the number of compositors or the division of their labour (Rasmussen & West 2011). With considerable suspicion cast on the methodologies for distinguishing compositorial stints, the whole matter rests in limbo until past analyses are reconfirmed with new techniques. Of the studies described above, only Taylor's used an electronic text of the Folio to produce listings of all the words supposedly set by each compositor. The present paper will describe a computer system for confirming the counts used to make compositor stint attributions using freely available electronic texts and new standards in textual markup and search called Extensible Markup Language (XML) and XQuery. In support of the paper, and to enable readers to repeat the methodology if they wish, an electronic archive (a ZIP file) of source texts, software programs, and output documents called "GIE-appendices.zip" is circulated with the paper.¹

Most Shakespearians encounter XML, if they encounter it at all, in connection with large collaborative projects making transcriptions of early modern manuscripts and print editions, usually conforming to the standards of the international Text Encoding Initiative (TEI). The success of the TEI standards in the collaborative digitization of, for example, the works of Geoffrey Chaucer and the New Variorum Shakespeare has generated the impression that XML is a technology suited only to large projects serving multiple purposes at once. It is widely, but wrongly, assumed that TEI encourages the encoding of every encodable feature in an early modern text, so that, for example, speech prefixes and stage directions must be distinguished from dialogue, prose from verse, and prologues and epilogues from the plays to which they are attached. So great is the effort needed to encode everything, and so pervasive the assumption that incomplete encoding is a waste of time, that many who might benefit from XML encoding never take the first step. This paper shows that 'quick-and-dirty', incomplete XML encoding made to 'home-brewed' rather than TEI standards can serve purposes not anticipated in existing large digitizations. For our purposes, a line in the Folio is a collection of pieces of type and it does not necessarily matter which words are speech prefixes, which dialogue, and which stage direction. We may record these things as and when we need to.

The recreation and checking of previous studies of compositorial habits requires an electronic text of the Folio and a means for pulling from it the parts attributed to different compositors in the various studies. A suitable electronic text is easily obtained, but the application to it of the competing and incompatible scholarly hypotheses about the compositors is difficult because of a prevailing encoding convention that must not be broken if off-the-shelf software and standards are to work. We will illustrate the convention, and this project's response to it, using the first few lines of the Folio's text of The Tempest. In XML markup one takes an electronic version of the raw words and punctuation in one's text and surrounds various parts of it with 'tags', chosen by the user, that record the features of interest. The beginning of The Tempest might be recorded thus:

<play>
<act n="1">
    <scene n="1">
      <line n="1">A temptestuous noise of Thunder and Lightning heard: En-</line>
      <line n="2">ter a Ship-master, and a Boteswaine.</line>
      <line n="3">Master</line>
      <line n="4">BOte-swaine</line>
      <line n="5">Botes. Heere Master: What cheere?</line>
      . . .
      <line n="78">faine dye a dry death. Exit.</line>
    </scene>
</act>
      . . .
<act n="5">
      . . .
</act>
</play>

Notice that each element of the play is marked by a pair of tags, the opening one naming the kind of element it is, such as play, act, scene or line, and the closing one repeating the name but prefixed by a forward-slash meaning 'end of' play, act, scene or line. The tags are here coloured only to help show the structure. An important point is the Russian-doll (or Chinese-box) principle: the lines are nested inside scenes, which are nested inside acts, which are nested inside the outermost element called 'play'. This nesting is demanded in XML--no line may cross a scene boundary, no scene may cross an act boundary--because XML treats every text as what is called an Ordered Hierarchy of Content Objects (OHCO). In XML it is normal to assert that novels must consist of chapters that consist of paragraphs, and poems of lines made of words. Thus XML has trouble with the works of writers such as Laurence Stern, who had a marbled endpaper printed in the middle of his novel Tristram Shandy, or the poet E. E. Cummings who frequently broke words across line boundaries.

The elements of a book that a bibliographer is interested in, such as columns, pages, formes and gatherings, cut across the elements usually marked up in XML such as speeches, scenes and acts. A speech may cross a page boundary and a scene a forme boundary. There are well-established means to reconcile incompatible hierarchies of interest within one XML document, but we have the further complication of wishing to record the multiple, incompatible scholarly hypotheses about who set which part. Suppose that Werstine thinks compositor A set the first two lines of The Tempest and compositor B the rest of the scene. We might encode thus:

<werstine-stint comp="A">
  <line n="TLN-1">A temptestuous noise of Thunder and Lightning heard: En-</line>
<line n="TLN-2">ter a Ship-master, and a Boteswaine.</line>
</werstine-stint>
<werstine-stint comp="B">
  <line n="TLN-3">Master</line>
  <line n="TLN-4">BOte-swaine</line>
  <line n="TLN-5">Botes. Heere Master: What cheere?</line>
      . . .
  <line n="TLN-78">faine dye a dry death. Exit.</line>
</werstine-stint>

This is acceptable (in the jargon, well-formed) XML: each line is wholly contained within one of the two stints as determined by Werstine. Now suppose that Taylor disagrees with Werstine, thinking that compositor A set the first three (not two) lines of the scene and compositor B the rest. We might encode thus:

<taylor-stint comp="A">
  <line n="TLN-1">A temptestuous noise of Thunder and Lightning heard: En-</line>
<line n="TLN-2">ter a Ship-master, and a Boteswaine.</line>
  <line n="TLN-3">Master</line>
</taylor-stint>
<taylor-stint comp="B">
  <line n="TLN-4">BOte-swaine</line>
  <line n="TLN-5">Botes. Heere Master: What cheere?</line>
      . . .
  <line n="TLN-78">faine dye a dry death. Exit.</line>
</taylor-stint>

This too is well-formed XML: each line is wholly contained within one of the two stints as determined by Taylor. Each of these two hierarchies may exist perfectly well within its own XML document, but if we try to make them co-exist in a single document a problem emerges:

<werstine-stint comp="A">
  <taylor-stint comp="A">
    <line n="TLN-1">A temptestuous noise of Thunder and Lightning heard: En-</line>
  <line n="TLN-2">ter a Ship-master, and a Boteswaine.</line>
</wertine-stint>
<wertine-stint comp="B">
    <line n="TLN-3">Master</line>
  </taylor-stint>
  <taylor-stint comp="B">
    <line n="TLN-4">BOte-swaine</line>
    <line n="TLN-5">Botes. Heere Master: What cheere?</line>
      . . .
    <line n="TLN-78">faine dye a dry death. Exit.</line>
  </taylor-stint>
</werstine-stint>

We have broken the Russian-doll/Chinese-box principle, or as they say in XML we have created overlapping hierarchies: instead of being wholly inside Werstine's compositor A stint, Taylor's compositor A stint is not 'closed' before Werstine's compositor A stint is closed and Werstine's compositor B stint is opened. It appears that we cannot make a single representation of the opening of the Folio that contains at once Werstine's and Taylor's hypotheses about its typesetting.

The solution is to have separate documents representing Werstine's and Taylor's views, but we do not want to create multiple copies of the Folio since any improvements or corrections to it would have to be made multiple times. We should instead store in one document the base text, with the markup that everyone agrees upon, and keep the scholars' competing views of it elsewhere. This approach is called stand-off markup. Here are snippets from the five documents needed for the beginning of The Tempest:

<line n="TLN-1">A temptestuous noise of Thunder and Lightning heard: En-</line>
<line n="TLN-2">ter a Ship-master, and a Boteswaine.</line>
<line n="TLN-3">Master</line>
<line n="TLN-4">BOte-swaine</line>
<line n="TLN-5">Botes. Heere Master: What cheere?</line>
(basetext.xml)

<xi:include href="basetext.xml" pointer ="TLN-1">
<xi:include href="basetext.xml" pointer ="TLN-2">
(werstine-comp-A.xml)

<xi:include href="basetext.xml" pointer ="TLN-3">
<xi:include href="basetext.xml" pointer ="TLN-4">
<xi:include href="basetext.xml" pointer ="TLN-5">
(werstine-comp-B.xml)

<xi:include href="basetext.xml" pointer ="TLN-1">
<xi:include href="basetext.xml" pointer ="TLN-2">
<xi:include href="basetext.xml" pointer ="TLN-3">
(taylor-comp-A.xml)

<xi:include href="basetext.xml" pointer ="TLN-4">
<xi:include href="basetext.xml" pointer ="TLN-5">
(taylor-comp-B.xml)

Notice that the base text contains only the uncontroversial line information for the first five lines. The document giving Werstine's view on compositor A's setting of those lines simply picks out the first two, and the document giving his view of compositor B's setting picks out the third, fourth and fifth . The document giving Taylor's view of compositor A's setting picks out the first three lines, and the document giving his view of compositor B's setting picks out the fourth and fifth.

Once presented in this form, the documents holding the scholars' views can be interrogated using an XQuery processor that replaces the XInclude statements with the content they identify. Although the statement of Werstine's view of compositor B's work contains only these pointers

<xi:include href="basetext.xml" pointer ="TLN-3">
<xi:include href="basetext.xml" pointer ="TLN-4">
<xi:include href="basetext.xml" pointer ="TLN-4">

when this document is queried the pointers are replaced by the elements they point to in the base text and thus the query is actually asked of this snippet

<line n="TLN-3">Master</line>
<line n="TLN-4">BOte-swaine</line>
<line n="TLN-5">Botes. Heere Master: What cheere?</line>

By running our XQuery against the document "werstine-comp-B.xml" we are running it against just the parts of the play that Werstine thinks compositor B set. It would be tedious to write an XInclude statement for each line that Werstine thinks compositor B set, but we do not have to specify stints in terms of individual lines: the procedure works just as well for columns, pages, formes or sheets. So long as we have identified these elements in the base text we may refer to them in the document that defines an investigator's view of a stint. When queried, this document stating the investigator's view returns the lines (and hence the word-pool) in the supposed stint. A change of mind about who set a particular line, column, page, forme or sheet can quickly be reflected in a change to the relevant statement(s) of stint(s) and the word-pool returned by the query will instantly be updated to reflect the revised attribution.

The base text may be marked up with any bibliographical features that are agreed upon by all investigators. Most usefully one would first record whether a line of type is full, meaning that the last letter or punctuation mark is hard against the right edge of the block of type, or short in the sense that the end of the line is filled with spaces. Typically, lines of prose (except the last one in each speech) are full while lines of verse are short. Every line placed in the press had to be made the same length, or 'justified', to form rectangular blocks of type. In short lines different widths of spaces would be combined at the end to ensure this, but in full lines not only the inter-word spaces but also the spellings might be altered to achieve perfect justification. For this reason full lines are usually excluded from spelling tests used to identify compositors. The uncontested information about line length may be encoded as an additional attribute of each line in the base text:

<line n="TLN-1" length="full">A temptestuous noise of Thunder and Lightning heard: En-</line>
<line n="TLN-2" length="not-full">ter a Ship-master, and a Boteswaine.</line>
<line n="TLN-3" length="not-full">Master</line>
<line n="TLN-4" length="not-full">BOte-swaine</line>
<line n="TLN-5" length="not-full">Botes. Heere Master: What cheere?</line>

This information can be used to exclude from the XQuery results the words in any line identified in the base text as full. Similarly, the use of spaces around commas, the indentation or turn-over/turn-under of overflowing lines, and the indentation or centering of stage directions, and other features counted by investigators, may be encoded in the base text. These features need not be recorded until one is recreating the investigations that used them. What follows below is a description of the first practical application of the above approach.

The first step is to find an electronic text of the Folio. Michael Best's Internet Shakespeare Editions (ISE) offers accurate transcriptions of every Folio play. We could download these already marked up with the tags that Best uses, but instead to get just the raw words we may 'scrape' the text off the computer screen using the 'select-all', 'copy' and 'paste' operations within a web-browser. After combining the 36 Folio plays and removing the unwanted text that accompanies each play when it is 'screen-scraped' (including advertisements used to fund the ISE website), we are left with a raw text file that contains unwanted line numbers and spurious blank lines. (Hereafter all the files referred to are available in the supplied ZIP archive "GIE-appendices.zip"). The raw text file begins like this:

T H E
T E M P E S T.
1
Actus primus, Scena prima.
A tempestuous noise of Thunder and Lightning heard: En-
ter a Ship-master, and a Boteswaine.
Master.
5BOte-swaine.
. . .
("F.txt" in "GIE-appendices.zip")

A small program written in the language Perl ("cleaner.pl" in GIE-appendices.zip") removes the unwanted line-numbers, blank lines, and also characters (such as the ampersand) that must be specially represented in XML, and wraps tags around each line to make it a 'line' element with the attributes 'linenumber' and 'length', the latter set by default to 'not-full'. The resulting XML looks like this:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE book SYSTEM "book.dtd">
<book>
<line linenumber="_1" length="not-full">T H E </line>
<line linenumber="_2" length="not-full">T E M P E S T. </line>
<line linenumber="_3" length="not-full">Actus primus, Scena prima. </line>
<line linenumber="_4" length="not-full">A tempestuous noise of Thunder and Lightning heard: En- </line>
<line linenumber="_5" length="not-full">ter a Ship-master, and a Boteswaine. </line>
<line linenumber="_6" length="not-full">Master. </line>
<line linenumber="_7" length="not-full">BOte-swaine. </line>
. . .

All that remains to make this a valid base text is manually to add tags marking the book's sheets, pages and columns and to set the length attribute to 'full' where necessary, as it is in line 4. This is done in an XML editor such as Oxygen and using a facsimile of the Folio as a crib. We need not consistently mark up all the larger subdivisions of the Folio base text (columns, pages, sheets) but rather only those that we wish to use in a particular analysis. If an investigator's attribution of stints is by pages only, we need not mark up the columns, and if it is by sheets we need not even mark up the pages.

For the sake of consistency, the Folio base text was in fact marked up to show sheets and pages but column subdivisions were left to be added later to those pages that needed them because investigators divided stints by column. The Folio base text is validated against the Document Type Definition (DTD) called "book.dtd" in "GIE-appendices.zip". A DTD is a formal expression of how the various elements of an XML document relate one to another, showing in this case that a sheet must be comprised of pages, a page must be comprised of lines and (optionally) a pair of columns, and that lines contain the raw character data expressing the words of the book. The DTD also defines the attributes, including 'length' for each line, that may be applied to each element. Although validation against a DTD is not essential, it forms a useful check for consistency in marking up the base text. For example, if by accident any page in the base text has only one column marked up or any line lacks its 'length' attribute, validation of it against its DTD (performed here by Oxygen) will fail and the offending lines needing correction highlighted.

Oxygen has a built-in XQuery processor with which to interrogate the documents that represent investigators' views of who set what. The research project to which this paper is an introduction will work chronologically through every published Folio compositor attribution study, recreating its methodology and checking its results. For now, we will confine ourselves to recreating and checking Satchell's originary claim about compositors A and B in Folio Macbeth and Willoughby's limited extension of it. Satchell attributed scenes 1.1 to 3.3 (excluding 1.7) to compositor A and 3.4 to the end (plus 1.7) to compositor B, and offered the following table of spellings, giving A's spelling and count for each word followed by B's:

afraid 2 affraid 3; countreyes 1 countries 3; cryes 2 cries 2; deare 2 deere 3; dearest 4 deerest 1; doe 35 do 41; eyther 1 either 1; eternal 1 eternall 1; filthie 2 filthy 1; furie 1 fury 2; gift 1 guift 1; goe 15 go 9; haste 1 hast 2; hereafter 3 heereafter 1; interprete 1 interpret 1; majestie 2 majesty 3; memorie 1 memory 1; mercie 1 mercy 1; mistresse 1 mistris 1; neyether 2 neither 1; plentie 1 plenty 1; pluck 1 plucke 1; royaltie 1 royalty 1; rubs 1 rubbes 1; runs 1 runnes 1; societie 1 society 1; sunne 3 sun 1; sweare 1 swear 1; thick 3 thicke 1; traytor 1 traitor 4; trecherie 1 trechery 1; voyce 1 voice 1; wait 2 waite 1; weyward 3 weyard 3; winne 3 win 1

Willoughby thought that most of Satchell's words appeared too infrequently to be useful, but that in their various spellings do, go, traitor, here and hereafter are sufficiently frequent for meaningful analysis and to this list he added his own candidate detector, the word cousin. Willoughby's counts for these words² in all their forms were:

compositor A compositor B

doe 35 do 41

doo 2

goe 15 go 9

traytor 1 traitor 4

hereafter 3 heereafter 1

here (all but two cases) heere (all but one case)

cousin 3 cosin 3

cozen 1 cosine 1

cozen 1

(Willoughby 1932, 57).

To recreate Satchell and Willoughby's counts, it is necessary that the Folio base text for Macbeth be divided by pages for the most part, but where pages are thought to be shared a finer granularity is needed. Scenes 1.1 to 3.3 occupy from the beginning of page ll6v to the end of the first column on page mm5r, so page mm5r must be subdivided into two columns. Within 1.1-3.3, Satchell gives scene 1.7 to compositor B, and this scene begins fifteen lines down from the top of page mm2r and ends twelve lines up from the bottom of page mm2r. Thus for page mm2r the stints have to be defined at the line level. Within "GIE-appendices.zip" the files "satchell-comp-a.xml" and "satchell-comp-b.xml" show Satchell's attribution of stints for compositors A and B expressed as XInclude statements, and "attribution.dtd" holds the DTD against which these attribution documents are validated.³

The XQuery command that draws on Satchell's statement of his view of compositor A's stint is simply:

doc("satchell-comp-a.xml")//line[@length="not-full"]/text()

This tells the XQuery processor to open the file "satchell-comp-a.xml", to expand its XInclude statements by drawing the necessary elements (pages, columns, lines) from the file they point to ("F.xml"), to throw away all but those for which the 'length' attribute is set to 'not-full', and to return just the raw words within what remains.⁴ The output from this XQuery is included in "GIE-appendices.zip" as "satchell-comp-a-raw.txt" and the corresponding output for Satchell's compositor B is "satchell-comp-b-raw.txt". Notice that each file begins with the unwanted line of XML code '<?xml version="1.0" encoding="UTF-8"?>', must be deleted manually. For each compositor this raw collection of words was passed through a word-frequency counting program written in Perl ("word-frequency.pl" in "GIE-appendices.zip"), which was taken from an instructional guide to this programming language (Wall & Schwartz 1991, 39). The resulting frequency counts are included in "GIE-appendices.zip" as "satchell-comp-a-sorted.txt" and "satchell-comp-b-sorted.txt".

As can be seen from the frequency counts in "satchell-comp-a-sorted.txt" and "satchell-comp-b-sorted.txt", Satchell and Willoughby's claimed counts for the words they selected are generally confirmed, with a few small discrepancies. In compositor A's stint, Willoughby claims 15 occurrences of goe where we find only 13, 3 occurrences of cousin where we find 2 of cousin and 1 of cousins, and 1 of cozen where we find 1 of cozens. In compositor B's stint, Willoughby claims 41 occurrences of do where we find 42, 2 of doo where we find 1, 9 of go where we find 10, 4 of traitor where we find the same but also 3 of traitors, and 3 of cosin where we find 1 of cosin and 1 of cosins. Satchell used John Bartlett's concordance to Shakespeare rather than counting occurrences directly from the Folio, and he adopted Nicholas Rowe's emendation of Macbeth's "Who dares no more, is none" to "Who dares do more, is none". Willoughby did not describe how he made his counts, but his adoption of Satchell's numbers suggests the same limitations.

Although Willoughby referred to the effect of "crowded lines" on compositorial spelling he did not explicitly state that he had excluded full lines as we have done here (Willoughby 1932, 59n1). If Willoughby included full lines, that would explain his counts being higher than ours but not the equally noticeable occasions when they are lower. Repeating the above analyses with the XQuery command altered so that it does not exclude long lines--using '(doc("satchell-comp-a.xml")//line/text()'--makes our count for compositor A's use of doe rise to 38 (against Willoughby's 35) and of goe rise to 15 (the same as Willoughby's), and our count of compositor B's use of do rise to 43 (against Willoughby's 41) and of doo rise to 2 (the same as Willoughby's); all other counts remain unchanged. It seems likely, then, that Willoughby included long lines and missed a few occurrences of words he was looking for.

Having confirmed Satchell's work on Folio Macbeth, Willoughby described himself repeating it for Folio Julius Caesar, Hamlet, The Tempest and the first page of Troilus and Cressida, giving the compositor stints for each (Willoughby 1932, 58). Unfortunately Willoughby did not list the counts for each word in these stints. Having tested Folio A Midsummer Night's Dream, The Merchant of Venice and Romeo and Juliet Willoughby declared himself unable to detect signs of compositor A and B at work and thought it likely that "another pair of compositors" set these plays. Willoughby did not indicate whether he rejected compositor A and B's involvement in these plays because he found new spellings of his six test words or because there was no plausible pattern of alternation between compositor A's and B's known spellings. By the time our seminar meets in August 2012 I hope to present spelling preference tables for Folio Julius Caesar, Hamlet, The Tempest, Troilus and Cressida, A Midsummer Night's Dream, The Merchant of Venice and Romeo and Juliet (broken down by page and column) in order to conclude my checking of Willoughby's work.

Works Cited

¹In addition to the archive, a reader wanting to repeat the methodology used here will need a computer running the Perl and XQuery programming languages. Both are available for the Microsoft Windows, Apple OS, and Linux operating systems. The present work was done on an office-standard laptop running Microsoft Windows version 7, ActiveState Incorporated's ActivePerl version 5.12, and SyncRO Soft Limited's XML editor Oxygen version 13.2, which incorporates the Saxon XQuery processor.

²Willoughby bracketed word-forms to show that, for example, he considered cousin and cozen to be two spellings of one noun (both equivalent to modern cousin meaning a relative) rather than one noun (modern cousin) and one verb (cozen meaning to deceive/cheat). Just how Willoughby conceived his process of lemmatization, which requires manual checking of the linguistic context, is unclear, since he also bracketed together here and hereafter. A limitation of any computerized approach that uses a base text that has not been lemmatized is that it can only compare strings of letters, not dictionary words. So long as investigators give the raw counts for all word-forms and are explicit about which forms they consider alternatives for a single word, this limitation need not impede checking of their claims.

³The DTD for attribution statements is adapted from a TEI model for stand-off markup using XIncludes, altered to enable the attribution to also convey an assertion about the printer's copy for a stint by setting the 'copy' attribute to 'ms' or 'print'. This is an experimental feature not covered here.

⁴If the suffix "/text()" is omitted from the end of this command, the processor returns not the words in the lines but the lines themselves complete with their surrounding XML tags showing such things as the line numbers. As a useful check on the methodology this more complete output was first produced and compared with a facsimile of the Folio to make sure that the system was indeed isolating just those parts of the play that Satchell thought were set by each compositor.

Hinman, Charlton. 1957. "The Prentice Hand in the Tragedies of the Shakespeare First Folio: Compositor E." Studies in Bibliography 9. 3-20.

Hinman, Charlton. 1963a. The Printing and Proof-reading of the First Folio of Shakespeare. Vol. 1. 2 vols. Oxford. Clarendon.

Hinman, Charlton. 1963b. The Printing and Proof-reading of the First Folio of Shakespeare. Vol. 2. 2 vols. Oxford. Clarendon.

Howard-Hill, T. H. 1973. "The Compositors of Shakespeare's Folio Comedies." Studies in Bibliography 26. 61-106.

McKenzie, D. F. 1984. "Stretching a Point: Or, the Case of the Spaced-out Comps." Studies in Bibliography 37. 106-21.

O'Connor, John [S]. 1975. "Compositors D and F of the Shakespeare First Folio." Studies in Bibliography 28. 81-117.

Rasmussen, Eric and Anthony James West, eds. 2011. The Shakespeare First Folios: A Descriptive Catalogue. New York. Palgrave Macmillan.

Satchell, Thomas. 1920. "'The Spelling of the First Folio': A Letter to the Editor." Times Literary Supplement Number 959 (3 June). 352.

Shakespeare, William. 1996. The Norton Facsimile of the First Folio of Shakespeare. Ed. Charlton Hinman. Second edition with a new introduction by Peter W. M. Blayney. New York. Norton.

Taylor, Gary. 1981. "The Shrinking Compositor A of the Shakespeare First Folio." Studies in Bibliography 34. 96-117.

Walker, Alice. 1954. "The Folio Text of 1 Henry IV." Studies in Bibliography 6. 45-59.

Wall, Larry and Randal L. Schwartz. 1991. Programming Perl. Sebastopol CA. O'Reilly & Associates.

Werstine, Paul. 1978. "Compositor B of the Shakespeare First Folio." Analytical and Enumerative Bibliography 2. 241-63.

Werstine, Paul. 2001. "Scribe or Compositor: Ralph Crane, Compositors D and F, and the First Four Plays in the Shakespeare First Folio." Papers of the Bibliographical Society of America 95. 315-39.

Willoughby, Edwin Eliott. 1932. The Printing of the First Folio of Shakespeare. Supplements to the Bibliographical Society's Transactions. 8. Oxford. Oxford University Press for the Bibliographical Society.

compositor A	compositor B
doe 35	do 41
	doo 2
goe 15	go 9
traytor 1	traitor 4
hereafter 3	heereafter 1
here (all but two cases)	heere (all but one case)
cousin 3	cosin 3
cozen 1	cosine 1
	cozen 1