Picture of Gabriel Egan G a b r i e l   E g a n  .  com

"Digital Scholarly Editing with and Without Collaboration" by Gabriel Egan

We have discovered a great deal about Shakespeare's authorship in the last few years. We have learnt about his capacities as a writer and about his practices of collaboration. The new knowledge has been created by the new things that computers and digital text enable us to do and they change what we as editors must do, at least with Shakespeare and, I suspect, what we must do with other authors too.

Let me summarize three new and surprising facts about Shakespeare that digital scholarship has recently established. First, Shakespeare did not coin any new words. We now have in digital form a large proportion of the English writing that predates Shakespeare's. We can in every case I know of find someone else using, before Shakespeare did, every word for which the Oxford English Dictionary's earliest illustrative use is by Shakespeare. That is, we can antedate all the words that the OED might make us think that Shakespeare coined. Knowing this affects how we edit Shakespeare when we find an unusual word in the originating documents. If we thought that Shakespeare habitually coined new words, we might be tempted to treat an unusual word as a possible new coinage rather than as an error. [SLIDE] For instance, in the first edition of Shakespeare's Richard 3, Richard says in his opening soliloquy "Plots haue I laid inductious dangerous". The word 'inductious' might be an error for 'inductions', which is the reading from the 1623 Folio that editors usually prefer: "Plots have I laid, inductions dangerous". Or "inductious" might be a new coinage by Shakespeare that simply did not catch on. Knowing that we were wrong to think that Shakespeare was an inveterate word-coiner helps in deciding what to do here.

    The second discovery is that Shakespeare did not have a large vocabulary (Craig 2011; Elliott & Valenza 2011). In fact he was remarkably average in this regard. Knowing this is not especially helpful to editors, although it usefully moderates the reverence that some people feel when dealing with Shakespeare's text. The computational methods by which we have figured out Shakespeare's vocabulary are themselves interesting and I can describe them afterwards if anybody is interested.  

    The third discovery is co-authorship. This is the big news. According to the New Oxford Shakespeare complete-works project there are 43 plays that are wholly or partly by Shakespeare. These are the 36 plays in the 1623 Folio plus Pericles published in 1609 and The Two Noble Kinsman published in 1634 -- two additions that few people will find controversial -- plus five more that remain disputed. [SLIDE] Those five are Arden of Faversham published anonymously in 1592, The Spanish Tragedy published anonymously in 1592 but attributed to Thomas Kyd by Thomas Heywood in 1612, Edward the Third published anonymously in 1596, Sir Thomas More that was unpublished until 1844 and uniquely survives in manuscript form, and Cardenio published in 1727 in a heavily adapted form. A complete works of Shakespeare now needs to present 43 plays. Furthermore, of those 43 plays, only 27 were written by Shakespeare on his own; the remaining 16 plays contain Shakespeare's the writings alongside the writings of other men. Those 16 plays, in chronological order of composition, are these [SLIDE]. I won't read them all out.

    Some of these co-authorship claims are uncontroversial. Most Shakespearians have for some time accepted that Titus Andronicus was co-written with George Peele and Henry 8 was co-written with John Fletcher, even though both appeared in the 1623 Folio with only Shakespeare's name on the title page. Other co-authorship claims I present here are discoveries of the New Oxford Shakespeare project and are not accepted by other scholars. [SLIDE] But even if we take away the contentious claims, for more than half of these plays, virtually all professional Shakespearians are in agreement about the co-authorship. So we have to think through what co-authorship means for the editing of Shakespeare.

    Until recently, we did not think it made much difference. In an influential study of collaborative playwrighting in 1997, Jeffrey Masten argued that our efforts to separate out individual labours in a collaboratively written play are doomed to fail because in fact authors can and do merge their styles when writing together [SLIDE]:

. . . the collaborative project in the theatre was predicated on erasing the perception of any differences that might have existed, for whatever reason between collaborated parts. . . . Collaboration is, as we shall see, a dispersal of authority, rather than a simple doubling of it; to revise the aphorism, two heads are different than one. (Masten 1997, 17)

Masten's view of collaboration was based on French literary theory from the 1960s, and in particular the work of Michel Foucault.

    We now know that the claims about authorship made by the French theorists Roland Barthes and Michel Foucault in particular (Barthes 1968; Foucault 1969) grossly misled two generations of literary scholars. Authorship is not a will-o-the-wisp that we are unable to trace with empirical methods. The author is not a 'function' that we apply to texts in the act of reading and making sense of them. The text is not "a tissue of quotations drawn from the innumerable centres of culture" (Barthes 1977, 146). Authorship is an objective and measurable fact of writing that we can analyze with empirical methods that give us dependable, reproducible results. And collaborating authors, we now know, did not merge their styles in the way Masten suggested. He thought they did because French literary theory told him that they would.

    Fifteen years ago Hugh Craig summarized the case against the post-structuralist view of authorship that has dominated literary studies for the past 50 years [SLIDE]:

In the case of authorship, statistical studies might have revealed -- were free to reveal -- that authorship is insignificant in comparison to other factors like genre or period. In that case the theory that authors are only secondary to other forces in textual patterning would have been validated. . . . As it happens, however, authorship emerges as a much stronger force in the affinities between texts than genre or period. Unexpectedly, perhaps uncomfortably, it is a persistent, probably mainly unconscious, factor. Writers, we might say, can't help inscribing an individual style in everything they produce. We need to take account of this in a new theory of authorship. (Craig 2009-10, para. 3)

Craig is here referring to the many recent studies that have revealed measurable differences in the styles of different authors, and have shown that when multiple authors collaborate on one play we can distinguish who wrote which bits.

    Now that we can distinguish authorship we have to edit different parts of a multi-author text differently, respecting at each point the local author's preferences as they are revealed elsewhere in other works, and judging the need for emendation by that author's habits elsewhere. The old poststructuralist model of authorship relieved us of this burden because it told us that the two or more authors blended their styles into one seamless whole. Now that we now this to be untrue, we have to shoulder the burden the poststructuralism relieved us of.

     Editors of co-authored texts for which the shares have already been determined must now edit each part in the light of its author's personal habits. Editors of texts where co-authorship is suspected but not established have to investigate the possibility of co-authorship for themselves or get someone else to do it. The first question to be addressed, then, is just what aspects of language we should attend to in order to distinguish authorship.

    There two main approaches: studying rare words and phrases and studying common words and phrases. We can count the uses of these words and phrases in the text to be attributed and in the canons of our candidate authors who might have written it. If we are concerned with rare words and phrases we typically look for those that occur only in the text to be attributed and in one candidate author's canon. If we are concerned with common words and phrases, we look for the candidate whose frequencies of use regarding those words and phrases are closest to the corresponding frequencies of use in the text to be attributed. 

    When we attribute significance to our finding that a certain rare word or phrase from the suspect text appears in the canon of a candidate writer, we have to bear in mind that different writers' canons are different sizes. The 8 dramatists from Shakespeare's time who have left us the most plays have these canons of sole-authored works [SLIDE]. Shakespeare has largest canon: 27 out of the total here of 101 plays. If we imagine this set of 101 plays as a 'target' surface in which we might find any particular phrase we are looking for [SLIDE], it is clear that Shakespeare presents a larger surface area in which to find a match. [SLIDE] If our searches are like darts randomly thrown at this target, then all else being equal our darts will land in the 'Shakespeare' sector more often than in any other, merely by virtue of its being the biggest sector. I am speaking here of the significance we attach to finding or not finding rare words of phrases in authors' canons. As we shall we shall see, the problem of differing canon sizes scarcely affects studies that look for common words and phrases. 

    To correct for differing canon sizes, we could say that a 'hit' for Greene is weighted as 7 times more significant than a hit for Shakespeare because the Greene part of the target is only one-seventh the size. This is the procedure undertaken in the tables of Pervez Rizvi, whose online dataset 'Collocations and N-Grams' is the primary attribution tool used in the current project to edit the complete works of Thomas Kyd being led by Brian Vickers (Rizvi 2018; Vickers 2019). By this method, the project has expanded the Kyd canon -- which until a few years ago was widely agreed to have just one play in it, The Spanish Tragedy -- so that it now includes Soliman and Perseda, and Cornelia (which for other reasons most people already accepted as his) and also Arden of Faversham, King Leir, Fair Em, 1 Henry 6, and Edward 3.

    I want to briefly show that the adjustment made for differing authorial canon sizes that is applied by Rizvi, on which these attributions depend, is unnecessary if one is counting common words and phrases and invalid if one is counting rare words and phrases, as the Kyd project's investigators do.1 

    Take the uncommon word 'water(s)', which is the 636th most frequently used word in all of Shakespeare. [SLIDE] This is how it is unevenly distributed across the plays, listed in alphabetical order from left to right. [SLIDE] The play with the most occurrences, The Tempest, has 14 times as many occurrences as the play with the least occurrences, Much Ado About Nothing. [SLIDE] For contrast, let us see how the word 'in', the 10th most common word in Shakespeare, is evenly distributed across the plays [SLIDE]. Here the spread is far less: Henry 5's 156 per 10,000 words is not even double The Winter's Tale's 87.

    What happens, then, if we extrapolate from a small canon to a large one, if we weight our hits so as to scale up the smaller canon to the larger? If we had only 4 Shakespeare plays (as we have only 4 Thomas Greene plays), would they give us an accurate sense of how many occurrences for 'in' and 'water' to expect in a larger canon? We can test this directly in the case of Shakespeare, because we do have the larger canon. [SLIDE] Here we show along the x axis an increasing canon size, taking the plays in alphabetical order. For both words, we start with just one scene of 1 Henry 4, then one act of 1 Henry 4, then 1 Henry 4 as one play, then 1 Henry 4 plus 2 Henry 4 as two plays, then those two plays plus Much Ado About Nothing as 3 plays, and so on adding one new play to the canon each time until we have put all 27 Shakespeare plays together. For each constructed canon we calculate the rate of 'in' and 'water' per 10,000 tokens.

    [SLIDE] We can see on the left that once we get to 4 plays, the rate of 'in' remains almost constant: it does not matter how many new plays we add to the canon. We need only a 4-play canon to get a good sense of rate of usage of 'in' in any larger canon. [SLIDE] But for 'water' the pattern is quite different. We see that the effect of adding the third play Much Ado About Nothing (which is exceptionally low in its use of 'water') is to drag the rate for the 3-play canon down markedly, and then adding the fourth play Antony and Cleopatra (which has an unusually large number of occurrences of 'water') takes the rate for the 4-play canon right up again. Then our 5-play canon is even worse than our 4-play canon for predicting the rate of usage of 'water(s)', since its rate is lower, and our 6-play canon is worse still, being lower still. Not until we have 14 plays in our canon is this collection showing a rate of usage of 'water(s)' that is even as much as three-quarters of the final rate for the full 27 plays.

    In growing our Shakespeare canon from one play to 27 plays we here took the plays in alphabetical order, which is effectively random order. In the worst-case scenario -- by which I mean if our 4-play canon happens to be our candidate author's 4 plays that use the word in question the least -- then the problem is far worse for the rare word 'water' but not for the common word 'in'. I have not the time to show this, nor to show that the problem is not confined to the uncommon word 'water': it is a general problem with uncommon words because they are unevenly distributed across author's canons.

* * *

    There is much more to be said about the methods for distinguishing authorial styles so that we can detect co-authorship. But as a general principle it is safest to look at rates of common words rather than rare words, because of this problem of uneven spread. Of course, one would not base an authorship attribution on the rate of usage of one word. But even counting just two common words gives useful data about authorship. [SLIDE] Here is what we get if we plot on the x axis how often each of our 8 dramatists uses the word 'the' and on the y axis how often he uses 'and'. Each dot represents two counts, one for 'the' and one for 'and', for each of our dramatists' canons.

    For Shakespeare we have removed As You Like It from the set of 27 plays and plotted the usage of 'the' and 'and' for just the remaining Shakespeare 26 plays. [SLIDE] Then we count these words in As You Like It and see where it falls on the plot. As you can see, the dramatist whose rates of using 'the' and 'and' are closest to the rates found in As You Like It is Shakespeare: his dot is the nearest to the As You Like It dot. If As You Like It were a play of unknown authorship, this plot would tell us that, regarding these two words at least, Shakespeare's habits are -- amongst those of the 8 dramatist we are considering -- the habits most similar to the habits found in the play.

    We can count the rates of usage of more than two words, and typical experiments count the 50 or 100 most frequent words. Of course, we cannot display 100 counts on a two-dimensional plot like this because we would need 100 dimensions. But the mathematical formulas that tell us which dot is nearest to which other dot work exactly the same in 100 dimensions as they do in two, and finding the nearest dot is trivial.

* * *

    Even if you have a reliable method for determining authorship, it is not obvious how you find where one author takes over from another in a co-authored text. It may be that you have reason to suppose that the authorial stints were organized by an artistic unit such the scene or the act in plays or the chapter in novels. Evidence from papers in the archive of the theatre impresario Philip Henslowe gives reason to suppose that the act was sometimes the unit of authorial composition in Shakespeare's time, so that different writers would be given the task of writing different acts of a play. But we cannot assume that this always happened.

    One successful approach for detecting authorial stints is called 'rolling windows'. [SLIDE] Suppose we want to be able to detect the presence of a single run of lines, say 1500 words, by a second author (here coloured red) within a play that is otherwise all written by a main author (here coloured blue). Suppose we know that the minimum block of text that we can reliably detect the authorship of is 2000 words, and that our test will always point to the author who wrote the majority of the words in that 2000-word block. We could test each successive 2000-word block in our text. But there is a good chance that, as here, the run of 1500 words by our second author that we are trying to detect will not fall wholly within any one of our 2000-word blocks and hence the verdict given for each block with be that the main author, in blue, wrote it, because the second author's writing never happens to predominate in any one block.

    [SLIDE] If we instead make our 2000-word window roll across the text, creeping forwards say 500 words at a time, then as it rolls over the block of 1500 words by our second author there will be several successive windows in which the second author's words predominate and the verdict will change to show our second author. [SLIDE x 34) We will not be able to say precisely where this second author's stint begins and ends, but we will know the approximate location. For this method to succeed, the window size needs to be just a little larger than the smallest block of secondary writing we wish to detect and also it needs to be large enough to bring in enough writing for reliable determination of authorship [BLANK SLIDE].

* * *

    So much for digital methods for detecting co-authorship. In the particular case I want to present to you, the play comes down to us in manuscript form and hence there are additional layers of information about authorship: the differing handwritings in the manuscript and the differing inks used by the writers. The play is Sir Thomas More, which I have just finished editing for the New Oxford Shakespeare Complete Alternative Versions to be published next year. This part of the New Oxford Shakespeare project is being created by the editors in Text Encoding Initiative XML that the publisher, Oxford University Press, has agreed to ingest directly. We edit the Shakespeare works in Oxygen XML Editor and inside Oxygen we run an Extensible Stylesheet Language Transformation provided by the publisher, whch turns our XML into HTML to give us an instant proof to check that we are encoding things correctly.

    I expect that for most of tonight's audience, working this way in TEI XML is unexceptional. (I should say that when I speak at Shakespeare conferences, it is exceptional; and I know of one large collaborative Shakespeare editing project that has been trying for 15 years to switch to using TEI XML from the ground up, and has not managed it.) There is innovative digital scholarship in the authorship attribution investigations we are doing for the edition. And there is digital scholarship in our systematic use of large datasets of published books so that we can edit each part of a multi-authored work in the light of the habitual locutions of that part's author as found in other works. Naturally, for this we use the Early English Books Online Text Creation Partnership (EEBO-TCP) dataset, and for our work the best interface to this dataset is the Early Print website from Washington University in St Louis. The Early Print project, as you may know, has applid morphosyntactic tagging and lemmatization to the EEBO-TCP dataset so that one can search by part-of-speech and by dictionary headword, and the Early Print search engine accepts Regular Expressions, which those from the commercial providers of EEBO do not.

    What is special and challenging about the Sir Thomas More is the complexity of its collaborative authorship. Shakespeare contributed one scene and perhaps a couple of scattered speeches. The play survives as a manuscript in the British Library and was not printed until 1844. [SLIDE] The title page of John Jowett's Arden Shakespeare printed edition of the play gives you a sense of the problem (Munday et al. 2011). It reads "Original Text by Anthony Munday and Henry Chettle, Censored by Edmund Tilney, Revisions co-ordinated by Hand C, Revised by Henry Chettle, Thomas Dekker, Thomas Heywood and William Shakespeare". And for this edition, "Edited by John Jowett".

    Because this is a manuscript play, Jowett wanted to convey certain details of the manuscript that he thought his readers would be interested in, such as deletions and additions marked in various ways. The alterations in the manuscript leave us with a set of problems about agency, since we have the writings and crossings out of traditional authors -- Munday, Chettle, Dekker, Heywood, and Shakespeare -- some of whom wrote the original play and others who revised it. Alongside their writing we have the crossings out and insertions of various symbols and ruled lines and whole words of someone we call Hand C, a theatrical scribe who supervised the integration of the revisions into the existing material. And we have the crossings outs and insertions of various symbols and ruled lines and whole words by the state censor, Edmund Tilney.

    [SLIDE] Here is a particular thorny moment of crossing out. The main handwriting here is Shakespeare's, writing a new version of the scene in which Thomas More manages to quell a riot by force of his rhetoric. Hand C has crossed out four and half lines, shown by underlining in Jowett's edition with a subscript 'C' at the beginning and end of the underlining to attribute agency for the deletion. Hand C also added four words of his own, "Tell me but this", interlined above his deletion and which Jowett enclosed in superscripted letter Cs. Within the material by Shakespeare that Hand C crossed out there are other deletions that Shakespeare made as he wrote and which are explained in Jowett's textual notes but are not represented in the main text.

    The TEI standard and guidelines are fully equipped to deal with a complex case of multiple hands making the multiple insertions and deletions present in this example. But I do not think that many readers are capable of dealing with them. It seems to me that Jowett's typographical codes will mean nothing to almost all readers of an Arden Shakespeare volume. [SLIDE] Those who can make sense of what Jowett's codes mean might just as well use W. W. Greg's Malone Society Reprint edition of the manuscript itself.

    Jowett wanted to present to his reader "alternative readings in the revised state of the text" (Munday et al. 2011, 123). I doubt that this is achievable in a printed edition of this particular text because it is so complicated, and I have not seen a digital edition capable of adequately presenting alternative readings in such complex cases. If you have seen such a digital edition, I would be grateful to hear about it so I can reconsider this matter. Where the alternative readings give us a choice, in my edition I make the choice instead of leaving it to the reader, and I give justification for my choice in the textual notes.

    My edition aims to present the play as its creators intended it to exist after all the revisions written by Chettle, Dekker, Heywood, and Shakespeare had been integrated into the original text as written by Munday and Chettle. This version of the play that I aim to present is not fully realized in the manuscript that is our only authority, and so I apply my editorial labours of completion to the manuscript. These labours are limited to the creation of text that would have been acceptable to early modern theatrical professionals as the basis for a script that could be acted.

    My editorial policy entails a hierarchy of authority regarding the persons whose handwriting is present in the manuscript. I had to rank the collaborating co-authors. At the bottom of my hierarchy with least authority is Edmund Tilney, the Master of the Revels whose job it was to censor the play. Indeed, he could be said to have negative authority in my hierarchy, since rather than respecting his labour, I actively want to undo whatever effect I find he has had on the text, because censors are the enemies of artists.

    This raises a problem, since in this case the censor's prohibition of the dramatization of the citizens' uprising against the foreigners would make untenable the entire play as conceived by its authors. We cannot take partial account of Tilney's interventions in the script, since they are all of a piece, so we must set aside the whole of them and present the play as its authors intended it even if, as might well be the case, Tilney was right and the resulting play could not have been performed.

    Hand C was a professional theatrical functionary, and he coordinated the integration of the newly written sections into the original text to produce the revised version. His labours on the revisions included deleting material that was no longer needed, rewriting stage directions to manage new entrances and exits, and making notes about casting the roles. These tasks make Hand C effectively an authoring partner with the dramatists and almost all of his interventions in the script are accepted by me. But Hand C also made mistakes, and where these are things that I think he would have put right if someone had pointed them out to him, I do not adopt Hand C's mistaken readings but instead I put them right for him.

    The example on screen is a clear mistake by Hand C. The problem is Shakespeare's habitual lack of punctuation. [SLIDE] Hand C thought that a sentence ended here, giving ". . . and your unreverent knees make them your feet to kneel to be forgiven". If that is right, what follows make no sense as a new sentence starts: "is safer wars than ever you can make| Whose discipline is riot.| In, in, to your obedience! Why, even your hurly| Cannot proceed by but obedience". Hand C could make no sense of this so he deleted it all and invented in its place the simple bridge [SLIDE] "Tell me but this". But of course what Shakespeare actually meant was [SLIDE] ". . . and your unreverent knees make them your feet". New sentence: "To kneel to be forgiven| Is safer wars than ever you can make| Whose discipline is riot.| In, in, to your obedience! Why, even your hurly| Cannot proceed by but obedience". So here we must disregard what Hand C did.

    Where a dramatist has apparently changed his mind in the course of composition, for instance by crossing out a word and writing a near synonym directly after it, I adopt the second thought in preference to the first. [SLIDE] Here Shakespeare wrote "euen yor warrs" and then crossed out "warrs" (presumably because he remembered that he used it in the previous line) and substituted "hurly", so I adopt "hurly" and merely give a note about the deleted "warrs". Where the deletion of whole lines appears to be a dramatist's decision about his own work, perhaps thinking he can do better -- as with Munday's first attempt to end the play -- I accept the deletion and omit the lines. Likewise where a deletion seems to be by Hand C and represents something forced on him by the task of combining the revisions with the original script, I adopt the deletion.

    But where I cannot reasonably attribute deletions to these causes, my failure to find a principled explanation for them leads me to ignore the deletion marks and present the deleted matter to the modern reader with a textual note about my decision to retain it. In general, where multiple lines are marked for deletion and I cannot tell by whom, my preference in a section of writing in Hand C is to retain the lines unless I have evidence that the dramatist who composed that section would have agreed to their deletion, while my preference in a section of writing in the hand of one of dramatists is to delete the lines unless I have evidence that he would have preferred to keep them. That is, in general and unless I have a special reason, I do not allow Hand C to unilaterally delete a dramatist's lines but I do allow dramatists to delete their own lines.

    You will notice that the editorial principles I just sketched place a high value on what I think an agent would want to do, as when removing from the text the result of Hand C's error because I think that Hand C would want us to do that if he had realized his error. This is explicitly an editorial position that treats authorial intention as  something we can and should recover, and I believe that our recent digitally enabled discoveries about authorship support that position.

* * * 

    The New Oxford Shakespeare is a collaboration of nine editors, but necessarily there is a hierarchy since the general editors choose the editorial principles for the edition and the junior editors must simply follow them. But there are hierarchies within hierarchies. I am the most junior of the general editors on account of my joining the project long after the others had done much of the work and, frankly, because I am the youngest in terms of career years (and indeed calendar years). But until I joined it, the project was not planning for the editors to work directly in TEI XML -- that was my contribution -- and within the team I am the one who decides exactly how we are going to use the TEI standard to create the edition we want. Specialism generates its own seniority.

    This seems also to have been true in the early modern theatre. As a mere scribe, Hand C of Sir Thomas More was considerably less important to the playing company than Shakespeare and the other dramatists were, and his position in the hierarchy was lower. But he had specialist skills that the others lacked and they accorded authority to his writing because of those skills, not on the basis of his overall status. Hand C made mistakes, but when he was operating in accordance with the collective intention of the collaborating team his decisions could override or refine those of Shakespeare or any of the other dramatists.

    [SLIDE] Here is concrete example of that, at the start of Shakespeare's contribution to the play. Shakespeare wrote the speeches for Lincoln, the leader of the rioters, and provided Lincoln's name as the speech prefix. Shakespeare wrote speeches for the other rioters to respond to what Lincoln is saying, but he did not differentiate these speakers. Shakespeare wrote "other" as the speech prefixes meaning that someone from the crowd should answer Lincoln. Hand C crossed out Shakespeare's speech prefixes for "other" and distributed the speeches to particular characters -- George Betts, his brother Clown Betts, and Williamson -- according to his own judgements of how well the speeches fitted those characters as they had been developed previously in the play

     An editor sensitive to the nature of the  dramatic collaboration that Hand C and Shakespeare were engaged upon must, I would argue, prefer what Hand C's wrote here to what Shakespeare wrote. To be consistent as a Shakespearian editor, and to complete the incomplete intention to which the manuscript is witness, an editor has to suppress the writing of the greatest dramatist the world has ever known, and in its place put the words of a humble playhouse scribe whose name is lost to us. In their collaborativeddd editorial labour back then as in ours now, specialism generates its own seniority.

Coda

Sometimes, paradoxically, people collaborate without meaning to and with people they despise. By 'collaboration' I mean here that multiple people's endeavours are complementary and add up to more than the sum of the parts. I am thinking of the unintended collaboration can emerge between rival teams precisely because those teams do not get along. I have referred to the authorship attribution work of Brian Vickers and his team producing a complete works of Thomas Kyd. It is no secret that Vickers has a low opinion of the authorship attribution work of the New Oxford Shakespeare, and that we hold a low opinion of his team's authorship attribution work. For everyone else, this mutual disdain can be a bonus.

    [SLIDE] One of great dangers in authorship attribution work is confirmation bias: convincing yourself that something has been proved because you want to believe it is true. My team and Brian Vickers's team have no incentives to agree with each other. Thus, when my team and Vickers's team agree on something -- such as Shakespeare's hand in writing additions to Thomas Kyd's play The Spanish Tragedy -- we are not agreeing because we like each other and want to agree. We have every reason to disagree with each other, and only the overwhelming evidence forces us to agree. We approach the attribution questions using different methods, and yet we concur. There is no confirmation bias at work. By a perverse kind of teamwork born of professional rivalry, we confirm each other's findings. In such a circumstance, you may be more than usually sure that our conclusions are correct.

Notes

1Rizvi's website <https://www.shakespearestext.com/can/> is organized as a series of HTML pages that link to large ZIP files that contain files in the proprietary formats Word and Excel from the Microsoft Office software suite. The document containing Rizvi's weighting formula is called "Which-N-grams-are-the-Best.docx" and at the time of writing, 26 May 2024, the way to find it is:

1) Start at the landing page for https://shakespearestext.com/can

2) Follow the hyperlink attached to the word "experiments" (8 lines from the bottom of the page).

3) Follow the hyperlink attached to the phrase "Browse Results (approx. 173 Mb)" to download the file "results.zip".

4) Expand "results.zip" to create the five folders including one named "1-Which-N-Grams-are-the-Best". In this folder you will "Which-N-grams-are-the-Best.docx".

Works Cited

Barthes, Roland. 1968. "La Mort de L'auteur (The Death of the Author)." Mantéia 5. 12-17.

Barthes, Roland. 1977. Image-Music-Text. Trans. Stephen Heath. London. Fontana.

Craig, Hugh. 2009-10. "Style, Statistics, and New Models of Authorship." Early Modern Literary Studies 15.1. 41 paras.

Craig, Hugh. 2011. "Shakespeare's Vocabulary: Myth and Reality." Shakespeare Quarterly 62. 53-74.

Elliott, Ward E. Y. and Robert J. Valenza. 2011. "Shakespeare's Vocabulary: Did it Dwarf All Others?" Stylistics and Shakespeare's Language: Transdisciplinary Approaches. Edited by Mireille Ravasatt and Jonathan Culpeper. Advances in Stylistics. London. Continuum. 34-57.

Foucault, Michel. 1969. "Qu'est-ce Qu-un Auteur? (What is an Author?)." Bulletin de la Societé francaise de Philosophie 63.3. 73-104.

Masten, Jeffrey. 1997. Textual Intercourse: Collaboration, Authorship, and Sexualities in Renaissance Drama. Cambridge Studies in Renaissance Literature and Culture. 14. Cambridge. Cambridge University Press.

Munday, Anthony, Henry Chettle, Edmund Tilney, Hand C, Thomas Dekker, Thomas Heywood and William Shakespeare. 2011. Sir Thomas More. Ed. John Jowett. The Arden Shakespeare. London. Methuen.

Rizvi, Pervez. 2018. 'Which N-grams Are the Best [for Authorship Attribution]':? An Essay Self-published on Pervez Rizvi's Website ''Collocations and N-grams (CAN').

Vickers, Brian. 2019. "Is EEBO-TCP / LION Suitable for Attribution Studies?" Early Modern Literary Studies 21.1. n. pag..