Old and new methods for attributing authorship

"Old and new methods for attributing authorship" by Gabriel Egan

On 7 May 1996, Dorothy Woods, a retired health worker, was found dead in her home in Huddersfield in the north of England. She had been smothered by a pillow and signs of a break-in made local police pursue the theory of a burglary gone wrong. A window at the point of entry was found to hold the oily impression of a human ear pressed against it. Unfortunately for local burglar Mark Dallagher, Huddersfield police consulted a Dutch police officer, Cornelis van der Lugt, who although he had no forensics training had become convinced that ear-prints are as incriminating as fingerprints. Comparison of Dallagher's ear with the print left at the crime scene led to his conviction for murder, followed six years later by his retrial and exoneration.

The Dallagher case contains several lessons for the study of authorship attribution. At the time of Woods's murder, what was then described as the new forensic science of ear-print evidence was in its infancy with few experts. We now know that there can be no forensic science of ear-printing and no experts in it because ear-prints are not sufficiently distinctive. Shakespearian authorship attribution by computational stylistics is a new field with relatively few experts and has already had spectacular failures because the value of evidence was wrongly weighed.

As with finger-print and DNA evidence, the strongest kinds of argument in such cases are those used to exclude suspects rather than include them. If we find a partial human genome or finger-print at a crime scene, we might with certainty declare that no part of it appears anywhere in the DNA or on the fingers of a given suspect. The suspect cannot have made the print. But finding that the fragment matches part of a suspect's DNA or finger is not itself proof of guilt since, being only a fragment, it might also match others' DNA or fingers. When evaluating partial matches, we are forced to make statistical speculations about the likelihood that a fragment of a given size might match more than one person.

The history of Shakespearian authorship attribution has parallels with the Dallagher case in its measurement of features that were wrongly thought to be distinctive, in scholarly over-estimation of the value of evidence, and in faulty calculations of likelihood. Some commentators think that there is so much of this kind of error that the entire project is hopeless and that we cannot reliably attribute authorship using only internal evidence. By internal evidence I mean the writing itself as opposed to accounts of the writing, which we consider to be external evidence and include the presence or absence of authors' names printed on editions of their works.

Concerning external evidence, the earliest editions of Shakespeare's plays in the early 1590s did not routinely print his name on the title-page, but this was true of English printed drama generally so it has no special significance. By the end of Shakespeare's career, editions of plays routinely printed the dramatist's name on the title-page, but of course this is only evidence, not proof, of authorship. Shakespeare's name appeared on the title-pages of the plays The London Prodigal in 1605, A Yorkshire Tragedy in 1608, Sir John Oldcastle in 1619, and 1, 2 The Troublesome Reign in 1622, but no-one today takes these attributions seriously.

[SLIDE] In 1623 the First Folio edition gave Shakespeare sole credit for 36 plays that have since then formed the core of his accepted canon [SLIDE]. Only one play that was already in print but left out of the First Folio has been universally accepted as part of the Shakespeare canon since the late eighteenth century [SLIDE]: Pericles, which was published in 1609 with his name on the title-page. It is now widely agreed that Shakespeare co-wrote Pericles with George Wilkins. [SLIDE] In 1634 an edition of the play The Two Noble Kinsmen appeared with the names of Shakespeare and John Fletcher on its title-page, by the late twentieth century this had become widely accepted as an accurate attribution. [SLIDE] One seemingly conservative way to define Shakespeare's dramatic canon, then, is to include the 36 First Folio plays plus Pericles and The Two Noble Kinsmen. [SLIDE] These 38 plays are the ones offered in the Royal Shakespeare Company edition Complete Works (Shakespeare 2007). [SLIDE] Unfortunately this conservative definition is certainly wrong: there are undoubtedly more plays to which Shakespeare contributed parts, and substantial parts of plays in the 1623 First Folio are not his.

Henry VIII / All is True is presented in First Folio as entirely by Shakespeare, but in the mid-nineteenth century Samuel Hickson and James Spedding independently considered it in the light of the known Shakespeare and Fletcher collaboration in The Two Noble Kinsmen and both agreed that Henry VIII / All is True has the same distinctive mix of features. They could just hear the difference in styles, but those of us "less quick in perceiving the finer rhythmical effects" might be more readily convinced, wrote Spedding, by some counts of "lines with a redundant syllable at the end" (Spedding 1850, 121), meaning feminine endings to regular verse lines of iambit pentameter, as in "to BE or NOT to BE that IS the QUEST-yun". This feminine-ending tests remains a valuable tool for authorship attribution, as writers are fairly consistent in their rates of feminine ending use so long as we count across a substantial amount of writing. The rate in any one scene of a play might fluctuate according to the dramatic needs, but for whole acts and especially whole plays the rate of feminine endings to verse lines is stable for one author writing at one particular time.

Scholars in the nineteenth and twentieth century lacked tables of frequency rates for various verse features used by all the major dramatists. Philip Timberlake's PhD thesis accepted by Princeton University in 1926 went some way towards remedying this deficiency, and despite covering only the drama up to 1595 it remains the most complete tabulation of the frequencies of feminine endings in existence (Timberlake 1931). Timberlake addressed head-on the problem that no investigators agree on just what is a feminine ending, since words like heaven, even, hour, bower, flower, tower, power, and friar, all of which may be pronounced monosyllabically to give a masculine ending or disyllabically to give a feminine one. Towards the end of his study, Timberlake applied his findings to various problems of Shakespearian authorship. In the anonymous play Edward III (first published in 1596) the Countess of Salisbury's scenes show a sharp rise in the rate of feminine endings from well below Shakespeare's norm at 2.1% for the rest to the play to well within his norm of 4-16% for these scenes; Timberlake concluded that it is distinctly possible that Shakespeare wrote this part of that play (Timberlake 1931, 78-80, 124).

The problem of differentiating Shakespeare's writing from that of his co-author Fletcher was revisited by Cyrus Hoy as part of a series of seven articles on collected plays edition called Fifty Comedies and Tragedies by Beaumont and Fletcher published in 1679 (1679). Hoy suspected that many not all 50 of them were by Beaumont and Fletcher, and in his first article Hoy laid out his chief means for detecting Fletcher's writing: "use of such a pronominal form as ye for you, of third person singular verb forms in -th (such as the auxiliaries hath and doth), of contractions like 'em for them, i'th' for in the, o'th' for on/of the, h'as for he has, and 's for his (as in in's, on's, and the like)" (Hoy 1956, 130-31). Hoy acknowledged that such tests had been used before--most innovatively by W. E. Farnham and A. C. Partridge--and claimed only that his was the first study to apply them all systematically to the whole of a substantial body of writing. In large part Hoy's method confirmed earlier divisions of Henry VIII / All is True and The Two Noble Kinsmen between Shakespeare and Fletcher. The kinds of tests employed by Hoy have widely varying success with different authors, being particularly effective for distinguishing Massinger from Fletcher but less good for others.

Hoy's success in establishing a series of linguistic-preference tests and applying them to a substantial body of drama was inspirational to others in the field. Essentially the same kind of analysis--counting preferences for different ways of saying the same thing--was applied in the 1970s by David J. Lake and MacDonald P. Jackson to the problem of identifying Thomas Middleton's work, in the course of which the case for Middleton's hand in Shakespeare's Timon of Athens--another of the 36 plays in his Shakespeare's Folio--emerged most clearly (Lake 1975; Jackson 1979). Lake made no claims for innovation in the kinds of internal evidence he collected, indeed quite the opposite: "the general methods or particular tests I employ", he wrote, have "all been used over the past fifty years in authorship investigations" (Lake 1975, 10).

But one of Jackson's methods had not previously been applied to Shakespearian authorship attribution: the counting of the frequency of occurrence of so-called function words that express grammatical relationships between other words while carrying little or none of their own lexical value. Their role is to bring together the nouns, verbs, and adjectives in order to give a sentence its foundational structure. Typical function words in the English language are prepositions, conjunctions, articles, particles, auxiliary verbs, and pronouns, although linguists differ on just which words have so little lexical value that they properly belong in this category.

In his PhD thesis on distinguishing Middleton and Shakespeare's writing, and especially apportioning their shares of Timon of Athens, R. V. Holdsworth put himself squarely in the tradition of Hoy, Lake and Jackson (Holdsworth 1982; Holdsworth 2012). Like them, he counted various linguistic features such as contractions and the preference for modern (and urban) you over archaic (and rural) thou, but Holdsworth also introduced the innovation of counting the various formulaic phrasings used in stage directions to find author-specific idiosyncracies (Holdsworth 1982, 181-235). His comprehensive study of the form "Enter A and B, meeting", in which the placing of meeting makes clear that neither character is already on stage, was the first systematic proof that a recurrent form of stage direction could usefully distinguish authorship.

Without computer automation, the counting of linguistic features was always likely to be incomplete and error prone. The Textual Companion to the Oxford Complete Works was published in 1987 when such manual methods had taken the subject about as far as it could go, and its survey of the Canon and Chronology of Shakespeare's writing was a synthesis of the scholarship up to that point (Taylor 1987). So what has the New Oxford Shakespeare got to show for the 30 years of research on co-authorship since the last Oxford Shakepeare? Let us look first at what as already claimed by the 1986-87 Oxford Complete Works [SLIDE]:

Henry VIII / All is True A Shakespeare and John Fletcher collaboration

Timon of Athens A Shakespeare and Thomas Middleton collaboration

Titus Andronicus A Shakespeare and George Peele collaboration

Pericles A Shakespeare and George Wilkins collaboration

Macbeth Thomas Middleton's adaptation of Shakespeare's lost original

Measure for Measure Middleton's adaptation of Shakespeare's lost original

1 Henry 6 A collaboration by Shakespeare and Thomas Nashe and others

Sir Thomas More A collaboration between Anthony Munday, Henry Chettle, Thomas Dekker, Thomas Heywood, Shakespeare and others

All these attributions are accepted in the New Oxford Shakespeare to be published later this year and several of them are strengthened with new evidence, in particular the Middleton adaptations of Macbeth and Measure for Measure. But we also have some new claims of collaboration that substantially change the shape of the canon as the New Oxford Shakespeare will present it [SLIDE]:

Edward 3 This is a collaboration between Shakespeare and others

2, 3 Henry 6 These are all collaborations of Shakespeare with Christopher Marlowe and others

The Spanish Tragedy The Additions to the play (originally written by Thomas Kyd) that first appeared in the 1602 quarto (the fourth edition) are by Shakespeare

Arden of Faversham Act 3 (= Scenes 4 thru 9) is by Shakespeare

Cardenio Lewis Theobald's play Double Falsehood is an eighteenth-century adaptation of this lost collaborative play by Shakespeare and John Fletcher

The scholarship that has convinced the New Oxford Shakespeare general editors to change the Shakespeare canon in this was not only done by the general editors themselves. In particular, recent publications by MacDonald P. Jackson, Hugh Craig, John Burrows, R. V. Holdsworth, Marina Tarlinskaya, Guiliano Pascucci, Brett Hirsch, Jack Eliott, and Farah Karim Cooper have shaped our view. I do not have time to go into all that scholarship, but I can briefly sketched the approaches of MacDonald P. Jackson and Hugh Craig, who have contributed most to our view.[SLIDE]

Jackson's attribution method, now widely known, admired, and emulated, uses the database called Literature Online (LION) that is available to universities by subscription from the ProQuest Corporation and that offers typed-up, searchable texts of the vast majority of all English Literature--novels, poems, plays--published before the twentieth century. Jackson goes searching in LION for phrases and collocations found in the text he is trying to attribute, looking for those that are comparatively rare. When I say "phrases and collocations" I mean that he manually extracts from the passage he is trying to attribute every two-word and three-word string of words (that is, n-grams) and searches for them within a constrained time-period within LION (say, works written between 1590 and 1610) to see which are relatively rare, occurring five or fewer times. As well as strict strings of words, Jackson also looks for the same words occurring near to one another but not necessarily in the same order, in other words collocations.

Jackson tabulates which authors' canons contain occurrences of these phrases and collocations from the text to be attributed, and counts how many times each author's canon contains such a hit: the one with the most hits is declared to be the author. There are many refinements to Jackson's method that I do not have time to go into, for example the way that he weights the hits according to the size the canons they occur in. Shakespeare has by some considerable margin the largest dramatic canon in this period, so all else being equal he has, as it were, greater 'opportunity' to produce matches for the phrases and collocations in the work to be attributed simply because he wrote more than anybody else. There is another investigator working with a similar method to Jackson, called Brian Vickers, but I omit him from this survey because there are fatal flaws in the method and the tools that he uses that I would be happy to talk about the Q&A session and that make his conclusions unreliable.

The methods use by Hugh Craig are in large part adaptations of methods developed by his sometime co-investigator John Burrows, called the Zeta and the Delta tests (Burrows 2002; Burrows 2003; Burrows 2007). Instead of counting rare phrases and Jackson does, Zeta and Delta count the frequencies of the very commonest words, the function words like the, a, on, in, and so on. The rates of usage of these words are demonstrably specific to specific authors--we each have our own unconscious preferences about how often we use each one--and with electronic texts of all our materials we can have computers do the counting. The Delta method's key innovation is that it discounts the importance of words for which a set of authors is demonstrably variable in their rates of usage and it weighs more heavily the evidence from words that the authors use at consistent rates. Moreover, Delta puts on an even footing words that are used at different rates of frequency, as it measures variations in rates of usage, not the absolute numbers of occurrences. When comparing the text to be attributed to the texts in the comparison set, Delta finds where the unknown text uses certain words more often and other words less often than the average for the comparison set and it finds where a particular author's contributions to the comparison set also show the same pattern of favouring the same words and disfavouring the same other words.

This principle of identifying on a case-by-case basis the words that are most discriminating between various authors, rather than relying on pre-determined lists of words, also underlies Burrows's second innovation, the Zeta test. As a first step, the investigator establishes two sets of texts, each being the securely attributed works of a single candidate author or a group of authors. Zeta finds for itself the words that most distinguish these two sets, the ones that are especially common in the first set and especially uncommon in the second, and vice versa. The vice versa step means that the investigator has two lists of words, both of which are good discriminators between the two sets of texts.

When the numbers of occurrences of the discriminating words in each of the texts in the two text sets are plotted on an x/y graph--x for counts of words favoured by the first set and disfavoured by the second, and y for counts of words disfavoured by the first set and favoured by the second--the texts' scores fall into two distinct clusters: high-x/low-y for texts in the first set and and high-y/low-x for texts in the second set. This is just as we would expect since Zeta was made to find the words that would produce this outcome. Then the investigator has Zeta count the occurrences of the discriminating words in the text to be attributed and plot this on the same x/y graph [SLIDE]. If the text to be attributed shares the word-preferences of one of the two text sets, its x and y values will place it near or within that set's cluster on the graph. Here is a typical Craig scatter-plot showing that the play Coriolanus contains a lot of the words that Shakespeare favours that other writers tend to avoid, and has few of the words that other writers favour and that Shakespeare tends to avoid. This is just what we would expect, since Shakespeare wrote Coriolanus.

If the sets are chosen to be, say, Shakespeare plays on the one hand and Marlowe plays on the other, the Zeta method becomes for that application a good discriminator of these two writer's styles. One of the sets may be a multi-writer collective, so that the test may be, say, Shakespeare versus Marlowe+Greene+Peele+Nashe. As Burrows showed, and Craig confirmed with a great many validation runs for this technique (Craig & Kinney 2009), when the investigator takes a text of known authorship out of one of the sets and reruns the experiment as if this text were of unknown authorship--without letting this text help choose the discriminating word lists--the correct author is identified with reliability that typically (depending on who is being tested) exceeds 95% accuracy. Zeta is by some way the most powerful general-purpose authorship tool currently available.

Word Adjacency Networks

[SLIDE] It is possible to imagine a new method in computational stylistics that would be both like the approach of MacDonald P. Jackson in attending to the proximities of particular words to one another, yet without excluding all but the rare collocations, and at the same time like the Craig-Burrows approach in counting every occurrence of even the most frequently occurring words. What if we could take a text and count the proximity of every word to every other word, so that we capture the phenomenon of word-clustering at all levels where it occurs? It so happens that generating such data is technically trivial--the algorithm is not difficult to code--and the trouble arises rather in capturing this vast dataset in a form that enables meaningful comparisons to be made between texts. The technique I will end with is an application to Shakespearian authorship attribution of a mathematical approach to this problem that has been developed in other fields for other purposes, using what are called Markov chains to represent Word Adjacency Networks (WANs).

To explain the Word Adjacency Networks method one needs to understand a notion called Shannon entropy, which in the limited time available I can best do with a practical exercise. First please turn to the neighbour on your right or left, smile, and form a two-person partnership. In other words, get into pairs. One of you should hold the printed sheet printed provided and a pen and the other should hold a blank piece of paper and a pen. The person with the blank sheet is trying to guess the letters, words and sentences that the other person is holding. The guesser will take a guess what the first letter is and say it, and then is told by the other person whether they are correct or, if they are incorrect, they are told what the correct letter was. I will repeat that: the guesser will take a guess what the first letter is and say it, and then is told by the other person either "Yes, that it correct" or "No, the correct letter is ...". The person doing the guessing should write down just each correct letter as it is discovered, either because they guessed it or because they were told it. The person who has full text should, for each guess, above each letter in their printed sheet either a dash for a letter that is correctly guessed or, if the letter is wrongly guessed, they should write that letter above the one in their printed text. [SLIDE] As a cheat-sheet, here is a typical exchange.

Allow about 3 minutes for this exercise.

According to Claude Shannon, father of information theory, the amount of information in any piece of writing or other code that bears meaning can be precisely quantified as its unpredictability. After performing exactly the experiment we just did, Shannon calculated that overall English prose is about 75% redundant: three times out of four the next letter is guessable. This is the reason that today's SMS text-speak and various kinds of shorthand work [SLIDE]. In this context, redundancy means predictability: after the letter t the letter h is much more likely to follow than x is, and directly after q the appearance of u is almost a certainty. Shannon gave us the mathematics with which to quantify these patterns of predictability or unpredictability, borrowing the physicists term of entropy for it, and enabling us to compare the entropy of one text to that of another.

Instead of individual letters, we can apply this principle to how often certain words follow other words, either following them directly or falling nearby or far away. The trick is in how we record these various distances of each word from every other using a Markov chain. Take this extract from Shakespeare's Hamlet [SLIDE]

With one auspicious and one dropping eye,
With mirth in funeral and with dirge in marriage,
In equal scale weighing delight and dole,
(Shakespeare Hamlet 1.2.11-13)

[SLIDE] Let us just confine our attention to the proximities, one from another, of the four function words with, and, one, and in. [SLIDE] Starting with with and looking forward five words we find an occurrence of the word one, an occurrence of the word and, and another occurrence of the word one. [SLIDE] We record that in our Markov chain by a line from with to and with a value of 1 and a line from with to one with a value of 2. [SLIDE] We are done with the first one in the extract, With, and we [SLIDE] move to the next occurrence of one of our function words, which is the second word in the extract, one. Again looking forward five words we spot an occurrence of and and an occurrence of one, [SLIDE] so we draw a line from one to and, weighted 1, and a line from one to itself, weighted 1. [SLIDE] Then we move to the next occurrence of one of our function words, and it is and in the middle of the first line. Looking forward five words, we find an occurrence of one and and occurrence of with, so we add these to our Markov chain as two weighted lines emerging from the node for and. We proceed through the extract in the same way, adding fresh weighted lines (called edges) between nodes to indicate how often each word appears within five words of the others [SLIDE x 16]. This is our completed Markov chain.

We then do the same for another passage, [SLIDE] this time from Thomas Dekker's Satiromastix, for which I am showing here the completed Markov chain. [SLIDE] We end up with two Markov chains, each showing the Word Adjacency Network for the four words with, and, one, and in, in each extract. These two chains contain the information about the word proximities in the two extracts, and using Shannon's mathematics for entropy we can comapare them. You will see that there are fewer lines in the Satiromastix network, but the absolute number of lines is not the most important point. The key question is, "when this author chooses to follow one of these words with another of these words, which is she most likely to choose?" These network embody the author's preferences that answer this question. You can see that in the Dekker extract, the word in is never followed (within five words) by the word with: there is no line running from in to with. Dekker instead chooses to follow in by and (one time) and by one (two times).

This is only an illustration of the idea and for authorship attribution we use many more than four function words; 100 would be typical, but the resulting pictures are too complex to show you. And of course rather than short extracts from plays we use whole authorial canons as our sample. And instead of just recording the raw numbers of edges from node to node there are some weightings of edges and nodes to be applied using Shannon's mathematics for entropy and what is called limit probability. The edge-weightings reflect the fact that we consider words appearing close to one another to be more significant than words that are far apart, so instead of scoring "1" for a word appearing anywhere within our 5-word window, we give a greater score to words appearning near the beginning of the window. The limit probability weighting of nodes reflects the fact that we attach greater significance to words that are used often in the text being tested that to words that are used infrequently.

Specifically, how do we compare two WANs? The technical answer is that for each edge common to the two WANs we subtract from the natural logarithm of the weight in the first WAN the natural logarithm of the weight in the second WAN and then multiply this difference by the weight of the edge in the first WAN and then by the limit probability of the node from which this edge originates. This calculation is made for each edge and the values summed to express the total difference. In this method, matters which WAN we designate as 'first' and which 'second', so the procedure is performed twice, switching the designation the second time.

In this method, authorship attribution is done by calculating the overall difference (measured in Shannon entropy units called centinats) between a Markov chain representing the word-adjacency preferences for an entire canon by one author and the Markov chain representing the word-adjacency preferences of the play you wish to attribute. The first, quite limited, application of this method will be published in Shakespeare Quarterly this summer by myself and a team of electrical engineers from University of Pennsylvania. We considered the authorship of each act and each scene of each of the Henry 6 plays, taking as our candidates Shakespeare, John Fletcher, Ben Jonson, Christopher Marlowe, Thomas Middleton, George Chapman, George Peele, and Robert Greene. (Several of these candidates are implausible on external biographical grounds, of course, but these are the writers for whom there survive enough plays to form a reliable profile by our method.)

We conclude that Christopher Marlowe had a hand in the three plays that the First Folio of Shakespeare would otherwise make us believe are by Shakespeare alone. Those plays are Henry 6, Parts One, Two, and Three [SLIDE]. This conclusion matches the conclusions made by other recent investigators using entirely different methods. Just how the collaboration happened cannot be determined by our method, since Shakespeare taking over and rewriting a play first written by Marlowe (or Marlowe and others) would, by our method, test much the same as a play that Shakespeare and Marlowe actively co-wrote. The presence of Marlowe in these plays, however, is now undeniable.

Works Cited

Beaumont, Francis and John Fletcher. 1679. Fifty Comedies and Tragedies. Wing B1582. London. J. Macock [and H. Hills] for John Martyn, Henry Herringman, Richard Marriot.

Burrows, John. 2002. "'Delta': A Measure of Stylistic Difference and a Guide to Likely Authorship." Literary and Linguistic Computing 17. 267-87.

Burrows, John. 2003. "Questions of Authorship: Attribution and Beyond." Computers and the Humanities 37. 5-32.

Burrows, John. 2007. "All the Way Through: Testing for Authorship in Different Frequency Strata." Literary and Linguistic Computing 22. 27-47.

Craig, Hugh and Arthur F. Kinney. 2009. Shakespeare, Computers, and the Mystery of Authorship. Cambridge. Cambridge University Press.

Holdsworth, R. V. 1982. 'Middleton and Shakespeare: The Case of Middleton's Hand in Timon of Athens': An Unpublished PhD Thesis Submitted to the University of Manchester.

Holdsworth, Roger. 2012. "Stage Directions and Authorship: Shakespeare, Middleton, Heywood." On Authorship. Edited by Rosy Colombo and Daniela Guardamanga. Memoria di Shakespeare. 8. Rome. Bulzoni Editore. 185-200.

Hoy, Cyrus. 1956. "The Shares of Fletcher and His Collaborators in the Beaumont and Fletcher Canon ([Part] I [of VII])." Studies in Bibliography 8. 129-46.

Jackson, Macdonald P. 1979. Studies in Attribution: Middleton and Shakespeare. Jacobean Drama Studies. 79. Salzburg. Institut fur Anglistik und Amerikanistik, Universitat Salzburg.

Lake, David J. 1975. The Canon of Thomas Middleton's Plays: Internal Evidence for the Major Problems of Authorship. Cambridge. Cambridge University Press.

Shakespeare, William. 2007. The Complete Works (=The Royal Shakespeare Company Complete Works). Ed. Jonathan Bate and Eric Rasmussen. Basingstoke. Macmillan.

Spedding, James. 1850. "Who Wrote Shakespere's Henry VIII?" Gentleman's Magazine. ns 34. 115-123, 381-382.

Taylor, Gary. 1987. "The Canon and Chronology of Shakespeare's Plays." William Shakespeare: A Textual Companion. Edited by Stanley Wells, Gary Taylor, John Jowett and William Montgomery. Oxford. Clarendon Press. 69-144.

Timberlake, Philip. 1931. The Feminine Ending in English Blank Verse: A Study of its Use By Early Writers in the Measure and its Development in the Drama Up to the Year 1595. Menasha WI. Banta.