Picture of Gabriel Egan G a b r i e l   E g a n  .  com

"Is Shakespeare's Language Unlike That of His Contemporaries?" by Gabriel Egan

[If you are holding one of the pieces of paper on which some words are printed, please do not show it to anyone else.]

Let us start with the claim, mentioned in the promotional material for this talk, that Shakespeare coined a lot of words that we used today. This would of course show his language being unlike that of his contemporaries, since that is what coining new words means: being unlike your fellow speakers. Did Shakespeare really have his own words? [clip Carlin] It is often claimed that Shakespeare coined a lot of words we use, meaning that he was the first to use them. But this claim gets muddled up with the claim that he used a lot of phrases that we still use, which is perfectly true. This is how Bernard Levin put it [SLIDE]:

If you cannot understand my argument and declare 'it's Greek to me', you are quoting Shakespeare. If you claim to be more sinned against than sinning, you are quoting Shakespeare.

Levin is quite correct. But people have often assumed that if we are quoting Shakespeare then these are words and phrases that Shakespeare coined, which is quite untrue. But the myth that he did is still widespread. Here is the website of the publicly funded Shakespeare Brithplace Trust on the subject [SLIDE]: "Shakespeare is credited with the invention or introduction of over 1,700 words ...". That is strictly true, but we have only to add one word to improve this sentence [SLIDE]: "Shakespeare is incorrectly credited with the invention ...".

Fortunately, we know quite well where this myth came from [SLIDE]. When the Oxford English Dictionary was begun in the middle of the nineteenth century, its radical plan was to be an historical dictionary that gave the earliest ascertainable recorded sense of each word. The aim was to identify when each sense of each word first entered the language, and the dictionary's illustrative quotations show that a particular sense has entered the language. But it was never the dictionary's intention to claim that the earliest quotation used to illustrate each sense was the first time that this word had been used in this sense. Rather, the claim was that this was the quotation was the earliest printed example that the dictionary's creators had found.

When the dictionary was created by a team of volunteers by what we would now call crowd-sourcing, the only way for its creators to find the earliest illustrative quotation for each sense was to wrack their own recollections of which authors had used which words. If the writer of a dictionary entry was particularly well read she might have a good knowledge of the main writers in Old English, Middle English, Early Modern English, and Modern English. But for the majority of the dictionary's creators, the early writer whose words they knew best was Shakespeare. And the bible and Shakespeare were the only texts for which anybody had created concordances, that is: alphabetized lists of all the words that appear in given a work with a reference to where in that work each word appears. Naturally, for any word that entered the English language around Shakespeare's time and that Shakespeare used, the OED creators were more likely to choose Shakespeare's use of the word than any one else's, simply because they knew their Shakespeare better than they knew their Ben Jonson or John Milton or Thomas Middleton or anyone else.

[BLANK SLIDE] In the past 25 years, most of the books published in England between the introduction of the printing press in the late 15th century and the start of the 19th century have been made available to researchers in searchable digital form. When we search in these vast datasets, we find that in every case a word for which the earliest example in the OED is by Shakespeare was in fact used by someone else before him, usually someone whose works are much less well known.

* * *

We can dismiss another enduring but not quite so well known myth, that Shakespeare had an unusually large vocabulary. That is, that Shakespeare knew more words than other writers did. To see how we know that this is untrue, we start with the total number of words in the Shakespeare canon. We have to be careful what we mean by a 'word' here. There are two senses of 'word' we must distinguish. [SLIDE] The speech "never, never, never" counts as three word tokens but only one word type. There are hundreds of thousands of word tokens in Shakespeare's canon, but only tens of thousands of word types, since lots of word types (such as 'the' and 'and') get repeated many times.

In calculating how many word tokens there are in the Shakespeare canon, we have to agree on just which plays and poems Shakespeare wrote, and that of course is a matter of scholarly dispute. To assist in the work of the New Oxford Shakespeare editors in 2011, Hugh Craig kindly made a calculation based on an agreed set of attributions to Shakespeare -- leaving out the disputed Arden of Faversham, Double Falsehood, and the Additions to The Spanish Tragedy -- and came up with the number 740,209 for how many word tokens there are in Shakespeare (Taylor 2017, 247).

Shakespeare used 740 thousand word tokens, but many fewer distinct word types. To figure out how many words types Shakespeare knew -- his vocabulary -- we can begin by considering how many different word types he used in his plays. To count the types in Shakespeare, I will use here the corpuses of sole-authored-well-attributed plays by Shakespeare and seven of his fellow dramatists for whom more than a few plays survive [SLIDE] For each of these corpuses I used the transcriptions of the plays from one of the major scholarly datasets of digitized early texts. After each name I have here recorded how many plays are in that dramatist's corpus.

A complicating factor is that in a small sample of writing we will, simply because it is small, find fewer word types than in a longer piece of writing by the same author. In tonight's talk I have so far today spoken 965 word tokens but only 371 distinct word types, which is far fewer types than I know how to use. Word types that I rarely use simply did not get the 'opportunity', as it were, to appear in a sample of my language as short as the first 10 minutes of this talk.

To adjust for the different sizes of the writer's canons we can divide the number of word types in a writer's canon word tokens in it [SLIDE]. In this division, a small corpus that uses many different words will get a quotient, larger than a big corpus that uses few different words. This types-to-tokens ratio is a measure of the richness of variety in a writer's language.

Here is table of the types-to-token quotients for our eight early modern dramatists:

  types tokens types/tokens tokens/types
George Chapman 17722 237604 0.075 13.41
John Fletcher 16700 339744 0.049 20.34
Robert Greene 8187 66967 0.122 8.18
Ben Jonson 26680 441301 0.06 16.54
Christopher Marlowe 10663 101506 0.105 9.52
Thomas Middleton 22025 332972 0.066 15.12
George Peele 9262 70662 0.131 7.63
Shakespeare 30216 638302 0.047 21.12

[SLIDE] The smaller the types/tokens value, the less the variety in the writing. [SLIDE] In the last column I have flipped these values to give the reciprocal, the ratio of tokens to types, and on this measure the higher the value the less the variety in the writing. [SLIDE] Notice that the highest three values in this column, the dramatists with the least varied writing, are the Fletcher, Jonson, and Shakespeare: the dramatists for whom we have the most surviving plays. [SLIDE] And notice that the lowest three values in this column, the dramatists with the most varied writing, are Greene, Marlowe, and Peele: the dramatists for whom we have the fewest surviving plays.

In fact, this calculation of language variety or richness is quite wrong. [SLIDE]. Dividing the number of different words types by the size of the canon measured in tokens overcompensates for the effect of some writers having large canon sizes, making their language seem less varied that that of writers with small canons. Simply scaling one's counts by the size of a dramatist's canon would be effective if the relationships between two values -- number of types and number of tokens -- were linear. But it is not.

Rather than a straight line, the type/token relationship is a characteristic curve. [SLIDE] To illustrate it, Gilbert Youmans (Youmans 1990, 588) noted how many different types had been encountered (and recorded on the y axis) as he read through, from first word to last, the 5000 tokens of a particular text (recorded on the x axis). He chose as his text the simplified story of Shakespeare and Middleton's play Macbeth as told in Charles Lamb's Tales from Shakespeare and rendered into the Basic English system invented by Charles Kay Ogden, in which only 850 different word types are allowed. With so few types at the writer's disposal it soon became necessary, after writing just a few sentences, to heavily reuse types that had already been used. Thus each new sentence is increasingly made up of repetitions of previously used word types and the curve soon starts to plateau [SLIDE x 9]. That is, as the token count rises steadily, the type count -- which is increased only by the use of new words not previously seen in the text -- goes up by ever smaller amounts.

The same principle of plateauing applies in real-world language that is not artificially constrained to using as Ogden's Basic English. The limit in the real world is not Ogden's 850 words but the complete set of words known by the writer, her vocabulary. This plateauing effect is the reason that large canons such as Shakespeare's, Jonson's, and Fletcher tend to produce overall a lower type/token ratio than small canons. In the large canons, the writers have more fully exhausted their entire vocabulary and are forced to repeat themselves. These large canons have more types than the smaller ones, but not more proportionally more. This tapering off of new type deployment allows us to estimate the size of a writer's vocabulary from the shape of this type/token curve. [SLIDE] The trick is to extrapolate the curve until the it becomes perfectly flat -- the point at which the writer can use no more new words because she knows none -- and then read off from the y axis the number of types in her vocabulary at this point.

The mathematical calculations for doing this are complex but the principle is straightforward. [SLIDE]. Here I have rescaled the axes in the tens of thousands because with a real author's canon and vocabulary the token and type counts are much larger that in Youman's illustrative example using Lamb's-Macbeth-in-Basic-English. In real-world examples, the curve simply stops long before it plateaus, because the writer's surviving canon (which constrains the x axis) is nowhere near large enough to start exhausting her vocabulary. [SLIDE] Thus we must delete the part of the curve where the plateauing occurs. How is it possible to extrapolate from this beginning part of the curve? We do this by observing the rate of change of the slope of the curve as we move along the x axis, from steep at the beginning and less steep as we read more of the canon [SLIDE x 9]. Each tangent to the curve shows the slope of the curve at a particular point along the x axis. The rate at which these tangents slow down their clockwise rotation is constant, so we can predict future tangents at higher x values by applying decreasing clockwise rotation, and plot the y values that each new projected tangent gives us [SLIDE x 10].

In a landmark study of 2011, Hugh Craig produced this kind of curve for William Shakespeare and for 12 of his contemporary playwrights (Craig 2011). What I have depicted as moving our attention along the x axis, taking in more and more writing by the author, was in Craig's study implemented as considering what is added to the type count by each successive new Shakespeare play as Craig added it to the experiment. When comparing what each new Shakespeare play added to the Shakespeare type-count with what each new play by one of the other dramatists added to their type-counts, Craig found that Shakespeare was in the middle of the pack. Entirely average. If we want to know what makes Shakespeare's writing extraordinary, we must stop looking in the area of vocabulary richness because in that he is not unusual. Shakespeare seems to use a greater variety of words than his rival dramatists, but that is an illusion caused by his leaving us more writing than they did. [BLANK SLIDE]

Craig pursued his analysis to consider how often in standard-size chunks of his writing Shakespeare used commonplace words versus rare words. Again Shakespeare was absolutely like his peers in this regard, not exceptional. Craig measured how often Shakespeare used the 100 most common words, compared to his rival dramatists. Again Shakespeare came out as utterly ordinary. Indeed, Craig concluded "If anything his linguistic profile is exceptional in being unusually close to the norm of his time" (Craig 2011, 68).

* * *

If Shakespeare was not linguistically inventive at the level of the word -- if he just used the words everyone else was using -- and if he did not have a vocabulary of words that was larger than everyone else's, what makes his style distinctive? There are two possible answers to that. The first is that Shakespeare's distinctiveness emerges in how he puts his words together into phrases. We can use computers to study Shakespeare's preferences for certain phrases and his avoidance of other phrases, and compare these preferences with those of other writers.

One investigator, Brian Vickers, claims that this is the only way that we can empirically classify Shakespeare's style and distinguish it from the style of any other writer. In an article in the Times Literary Supplement in 2008 (Vickers 2008), Vickers described his work on identifying the writing style of Thomas Kyd, who is known to us as the undisputed author of just one play, The Spanish Tragedy, and as the likely author of another called Soliman and Perseda, and as the translator into English of the French play Cornelia.

Vickers's method was to hunt for three-word phrases, what he calls 'triples', using software that is normally used to detect plagiarism in student essays. Vickers wanted to find the author of the play called Arden of Faversham that was published anonymously in 1592. Vickers found the triples that are common to Arden of Faversham and the three plays believed to be by Thomas Kyd. Vickers then sought to find these same triples in all the other plays of the period. Any triples found in other plays by other writers he discarded from his list, leaving him with a list of triples found only in the works of Kyd and in Arden of Faversham

Because there were several dozen such triples found only in Kyd's work and in Arden of Faversham, Vickers concluded that Arden of Faversham must be by Kyd. After all, what are the odds that so many three-word phrases would be found nowhere else but Kyd's writing and this anonymous play? Vickers repeated the process for Shakespeare's play Henry VI Part One and again found phrases that occur only in that play and in the works of Kyd, so he concluded that Kyd was a co-author on Shakespeare's Henry VI Part One

There are two methodological flaws in Vickers's approach. The first was that his dataset of what constitutes "all the other plays of the period" was not large and it lacked many eligible plays. Using a publicly available datasets it is trivial matter to show that a lot of phrases that Vickers thought were found nowhere else but in Kyd's plays and in Arden of Faversham or in Henry VI Part One are in fact found many times in plays that Vickers did not include in his dataset. This methodological flaw could be remedied by Vickers expanding his dataset.

But Vickers second mistake was fundamental. It was his procedure of starting with the play whose authorship he wanted to attribute -- Arden of Faversham -- and first finding the phrases that are common to this play and to his candidate author, and only when he had that list of phrases-in-common going to look  for them in other plays and removing the phrases that other writers also use. This procedure will almost always tell you that your candidate really is the author of the play you want to attribute.

The reason for this is that it turns out that for any two substantial bodies of writing we will always find phrases that they have in common that are not found elsewhere. That is a raw fact of large bodies of writing. If we start with a different candidate author, say Christopher Marlowe, and replicate Vickers's method, it turns out that Marlowe's plays and Arden of Faversham also contain a couple of dozen phrases -- different phrases from those found in Kyd's work -- that no other texts use.

This is not to say that looking for phrases in common is a pointless exercise when trying to quantify a writer's style and thereby attribute authorship. The method can be valuable so long as we devise experiments that give all authors equal 'opportunity' to win the race, as it were, rather than choosing a candidate in advance and looking for evidence to confirm that conjecture. But such experiments are difficult to devise and it is easy to introduce bias towards one writer or another. [SLIDE] One cause of bias may have occurred to you while I was showing one of my earlier slides.

If we simply count all the phrases that are found in common between any given play whose authorship we want to attribute and all the possible candidate authors from Shakespeare time, our counting will be skewed by the fact that Shakespeare left us far more plays than any other writer. In this list of sole-authored well-attributed plays, Shakespeare left us 27 plays, which is nearly twice as many as any other playwright. If you take any phrase at random and look for it in the four plays that Robert Greene left us and in the 27 plays that Shakespeare left us, that phrase is more like to be found in Shakespeare's plays simply because there are so many more Shakespeare plays than Greene plays.

For this reason, we cannot directly compare the count of phrases that match with Shakespeare and phrases that match with Greene. You might think that we could come up with some 'weighting' value to compensate for this bias by making a match with Greene carry greater significance than a match with Shakespeare. But because of the non-linearity we saw earlier regarding how type-token ratios change as a writer's canon gets larger, no one has yet come up with a mathematically defensible system for creating such weightings.

* * *

Where else can we turn for linguistic evidence of authorial style if not in phrases? I mentioned that turning to phrases was one of two responses to our finding that Shakespeare did not invent words nor did he have an unusually large vocabulary. A second response is to look again at individual words but from a new angle. [SLIDE] How often particular word types occur in English varies greatly from word to word. We all use the most-common word 'the' about once in every 15 words and the next most-common word, various forms of the verb 'to be' about once in every 30 words. The one hundred most frequently used word types make up about half of everything we say or write, and the less common word types (many thousands of them) make up the other half. 

Most of these the extremely common words are so-called 'function words', meaning that they express the grammatical relationships between other words while carrying little or no lexical value of their own. The role of function words is to bring together the nouns, verbs, and adjectives in order to give a sentence its foundational structure. Typical function words in the English language are prepositions, conjunctions, articles, particles, auxiliary verbs, and pronouns, although linguists differ on which particular examples have so little lexical value as to be properly called function words.

Although we all use more or less the same 100 most common words for half of all that we say and write, the precise rate at which we use each one of these most-frequent words varies from person to person and is distinctive of their syle. A person's preferences do not vary by the genre they are writing in nor do they vary over time. From a sufficiently large body of various authors' writings we may develop profiles of the differing authorial preferences regarding how often they use each of these words and then with these profiles we can attribute works of unknown or contested authorship.

Analysis of function-word frequencies has successfully determined authorship in cases as varied as the writings of the American Founding Fathers James Madison and Alexander Hamilton, the Roman statesman Cicero, the Book of Mormon, and the anonymized judgements of the US Supreme Court (Mosteller & Wallace 1963; Forsyth, Holmes & Tse 1999; Jockers, Witten & Criddle 2008; Jockers, Nascimento & Taylor 2019), and studies that quantify the accuracy of these authorship-attribution methods have shown function-word frequency to be objectively reliable at quantifiable levels of confidence (Hoover 2004; Argamon 2018).

A new refinement of this function-word frequency approach to authorship attribution called the Word Adjacency Network was first introduced eight years ago and has been applied by its inventors, of which I am one, to the field of early modern drama in general and the plays of William Shakespeare in particular (Segarra, Eisen & Ribeiro 2015; Segarra et al. 2016; Eisen et al. 2018; Brown et al. 2022). The central claim of the WAN approach is that as well as their frequencies, the patterns of clustering of the function words -- their distances one from another -- are distinctive of authorship. By measuring how far from one another an author places the most-common words (measured by the number of intervening words), the Word Adjacency Network method is able to identify combinations of function words that each author likes to put together in close proximity and other combinations that the author avoids.

These combinations of function words are also expressed as probabilities. We computationally analyse a text -- typically the entire canon of one writer -- and we derive, for each of these 100 function words, the frequency with which, when this author writes that word, he follows it, within 5 words, by one of the other function words. That is, starting with the word 'the' we record how often after a 'the' this author uses another 'the' within 5 words, and how often he uses a 'be' within 5 words, and how often he uses an 'of' within 5 words, and so on for all 100 words.

And then we do the same thing for the second word in the list, recording how often 'be' is followed by 'the', how often 'be' is followed by another occurrence of 'be', how often by an occurrence of 'of', and so on. This gives us data for 10,000 (that is 100 times 100) possible transitions from one word to another that an author can show he prefers or avoids. We can use these preferences as a series of probabilities, saying that in this author's style there is a given likelihood of following one particular word with another. And we can use these probabilities to predict what habits of word combination we are likely to find in a previously unseen sample of this author's writing.

** *

To help you gain an intuitive grasp of how the pattern of successive words in a language can be expressed as a series of probabilities, I would like you to try an exercise. First please turn to the neighbour on your right or left, smile, and form a two-person partnership. In other words, get into pairs. One of you should hold the printed sheet provided and a pen and the other should hold a blank piece of paper and a pen. The person with the blank sheet is the guesser trying to guess the letters, words and sentences that the other person, the guessee, is holding. The guesser will make a guess about what the first letter is and say this, and then is told by the guessee whether they are correct or, if they are incorrect, they are told what the correct letter was.

I will repeat that: the guesser will take a guess what the first letter is and say this guess, and then is told by the other person either "Yes, that it correct" or "No, the correct letter is ...". The guesser should write down just each correct letter as it is discovered, either because she guessed it or because she had to be told it. The guessee who has the printed text should, for each guess, above each letter in their printed sheet, make either a dash for a letter that is correctly guessed or, if the letter is wrongly guessed, they should write that letter above the one in their printed text. [SLIDE] As a cheat-sheet, here is a typical exchange.

Allow about 3 minutes for this exercise.

The guessee should now have, in those dashes, a record of how many letters were correctly predicted by the guesser, and it should be more than half if the guesser is any good. We just recreated the experiment by which Claude Shannon, the father of information theory (and hence the computer age), calculated that overall English prose is about 75% redundant: three times out of four the next letter is guessable. This is the reason that today's SMS text-speak and various kinds of shorthand work [SLIDE].

In this context, redundancy means predictability: after the letter t the letter h is much more likely to follow than x is, and directly after q the appearance of u is almost a certainty. Shannon gave us the mathematics with which to quantify these patterns of predictability, and borrowed from physics the term of entropy for it. With Shannon's mathematics of information we can capture, quantify, and study the patterns of repetition in language that make for its predictability and we can use the data to compare texts.

So much for the predictability of individual letters within words, which relies on your knowledge of common patterns of letters found in English words. If we had more time to complete more of the text, the guesser might also have been able to draw upon her knowledge of the habits of this particular text's author if she recognized who it was. Anybody know? Knowing that it's Raymond Chandler could help you predict that a word begin with g-u- is likely to be gun or gum rather than guildhall or gulag, and knowing Chandler's preferred phrasings could help you guess the entire word that is coming next.

These wider word-order choices can be quantified with same techniques, and the same precision, as the letter-order choices. Thus we can capture authors' individual phrasing preferences. That is what the Word Adjacency Network method does. There are two ways to visualize how the method works [SLIDE]. We can see it as a large table in which the 100 function words we are interested in are the headings to the columns and are also the headings to the rows. Each cell in the table records the preference for the word given in that column's heading being found, in this author's text, within 5 words of the word given in that row's heading. Thus if this author uses another one of our 100 function words within 5 words of using 'the', there is a 10% probability that the function word he will choose is 'of'.

The method actually generates tables exactly like this, but many people find the method easier to comprehend when drawn as what we call a finite-state machine or a Markov Chain. Let us do a worked example. Take this extract from Shakespeare's Hamlet [SLIDE]

With one auspicious and one dropping eye,
With mirth in funeral and with dirge in marriage,
In equal scale weighing delight and dole,
(Shakespeare Hamlet 1.2.11-13)

[SLIDE] We will confine our attention to the proximities, one from another, of the four function words with, and, one, and in. [SLIDE] Starting with with and looking forward five words we find an occurrence of the word one, an occurrence of the word and, and another occurrence of the word one. [SLIDE] We record that in our Markov chain by a line from with to and with a value of 1 and a line from with to one with a value of 2. [SLIDE] We are done with the first word in the extract, With, and we [SLIDE] move to the next occurrence of one of our function words, which is the second word in the extract, one. Again looking forward five words we spot an occurrence of and and an occurrence of one, [SLIDE] so we draw a line from one to and, weighted 1, and a line from one to itself, weighted 1. [SLIDE] Then we move to the next occurrence of one of our function words, and it is and in the middle of the first line. Looking forward five words, we find an occurrence of one and and occurrence of with, so we add these to our Markov chain as two weighted lines emerging from the node for and. We proceed through the extract in the same way, adding fresh weighted lines (called edges) between nodes to indicate how often each word appears within five words of the others [SLIDE x 16]. This is our completed Markov chain.

We then do the same for the same four functions words' appearance in another passage, [SLIDE] this time from Thomas Dekker's Satiromastix; here's the completed Markov chain. [SLIDE] We end up with two Markov chains, each showing the Word Adjacency Network for the four words with, and, one, and in, in each extract. These two chains contain the information about the word proximities in the two extracts, and using Shannon's mathematics for entropy we can compare them. You will see that there are fewer lines in the Satiromastix network, but the absolute number of lines is not the most important point. The key question is, "when this author chooses to follow one of these words with another of these words, which is she most likely to choose?" These networks embody the author's preferences that answer this question. You can see that in the Dekker extract, the word in is never followed (within five words) by the word with: there is no line running from in to with. Dekker instead chooses to follow in by and (one time) and by one (two times). [BLANK SLIDE]

This is only an illustration of the idea and for authorship attribution we use many more than four function words; 100 would be typical, but the resulting pictures are too complex to show you. And of course rather than short extracts from plays we use whole authorial canons as our samples. And instead of just recording the raw numbers of edges from node to node there are some weightings of edges and nodes to be applied using Shannon's mathematics for entropy and what is called limit probability. The edge-weightings reflect the fact that we consider words appearing close to one another to be more significant than words that are far apart, so instead of scoring "1" for a word appearing anywhere within our 5-word window, we give a greater score to words appearing near the beginning of the window. The limit probability weighting of nodes reflects the fact that we attach greater significance to words that are used often in the text being tested than to words that are used infrequently.

Using the mathematics of Shannon entropy we can compare two Word Adjacency Networks and so compare the stylistic similarity of two texts regarding these habits of function-word placement. The question that I promised to answer with this was "is Shakespeare's language unlike that of his contemporaries?". I cannot give you a yes/no answer to that question. Instead, I can, for each of Shakespeare's plays, answer the question "how unlike the language of his contemporaries is this play. [SLIDE] Here is the answer. [Talk them through this last slide].

Works Cited

Argamon, Shlomo Engelson. 2018. "Computational Forensic Authorship Analysis: Promises and Pitfalls." Language and Law / Linguagem e Direito 5.2. 7-37.

Brown, Paul, Mark Eisen, Santiago Segarra, Alejandro Ribeiro and Gabriel Egan. 2022. "How the Word Adjacency Network (WAN) Algorithm Works." DOI 10.1093/llc/fqab002. Digital Scholarship in the Humanities 37. 321-35.

Craig, Hugh. 2011. "Shakespeare's Vocabulary: Myth and Reality." Shakespeare Quarterly 62. 53-74.

Eisen, Mark, Alejandro Ribeiro, Santiago Segarra and Gabriel Egan. 2018. "Stylometric Analysis of Early Modern English Plays." Digital Scholarship in the Humanities 33. 500-28.

Forsyth, Richard S., David I. Holmes and Emily K. Tse. 1999. "Cicero, Sigonio, and Burrows: Investigating the Authenticity of the Consolatio." Literary and Linguistic Computing 14. 375-400.

Hoover, David L. 2004. "Delta Prime?" Literary and Linguistic Computing 19. 477-95.

Jockers, Matthew L., Daniela M. Witten and Craig S. Criddle. 2008. "Reassessing Authorship of the Book of Mormon Using Delta and Nearest Shrunken Centroid Classification." Literary and Linguistic Computing 23. 465-91.

Jockers, Matthew, Fernando Nascimento and George H. Taylor. 2019. "Judging Style: The Case of Bush Versus Gore." Digital Scholarship in the Humanities 35. 319-27.

Mosteller, Frederick and David L. Wallace. 1963. "Inference in an Authorship Problem." Journal of the American Statistical Association 58. 275-309.

Segarra, Santiago, Mark Eisen and Alejandro Ribeiro. 2015. "Authorship Attribution Through Function Word Adjacency Networks." Institute of Electrical and Electronics Engineers (IEEE) Transactions on Signal Processing 62.20. 5464-78.

Segarra, Santiago, Mark Eisen, Gabriel Egan and Alejandro Ribeiro. 2016. "Attributing the Authorship of the Henry VI Plays By Word Adjacency." Shakespeare Quarterly 67. 232-56.

Taylor, Gary. 2017. "Did Shakespeare Write The Spanish Tragedy Additions?" The New Oxford Shakespeare Authorship Companion. Edited by Gary Taylor and Gabriel Egan. Oxford. Oxford University Press. 246-60.

Vickers, Brian. 2008. "Thomas Kyd: Secret Sharer." Times Literary Supplement Number 5481 (18 April). 13-15.

Youmans, Gilbert. 1990. "Measuring Lexical Style and Competence: The Type-token Vocabulary Curve." Style 24. 584-99.