What Agreement Is Not

Tim Finney


[1] I am very glad to be able to contribute to this volume in honour of our mentor, Dr Richard K. Moore. Richard, you probably thought that you would not have to suffer any more technical tedium from me. If so, you were wrong. Fortunately for all of the readers, it is only a small piece. Even though you deserve far better, I hope that you will be pleased with this token of my esteem.

Introduction

[2] MSS vary from each other in every way, even when they are supposed to carry the same text. This is as true for biblical MSS as it is for any other kind. Nevertheless, given so great a cloud of witnesses, we can lay aside the secondary ones to distill a very primitive text. An important step in this process is to discern families of MSS.

[3] A popular method for determining affiliation between MS texts is to count agreements in a series of textual variation units. The number of agreements is divided by the number of units to obtain a ratio that indicates the similarity of a pair of MSS. This method works well with large numbers of variation units; however, it is not reliable for small numbers. This article aims to throw light on the topic of what is, and what is not, a significant level of agreement between MSS.

A Simplified Case

[4] Perhaps the simplest case is the comparison of two MSS over a set of binary variation units; that is, each variation unit has only two states. One example of such a "binary" variation unit is a place where the article is present in some MSS but absent in others, and where those are the only two possibilities in that place. Rather than confuse the situation further with mathematical idiom, I will switch to a well known example that is analogous from a statistical perspective: a series of coin tosses.

[5] Take two sets of coins of the same denomination, flip each coin in a non-prejudiced manner, then lay them out in parallel lines so that each coin has a corresponding partner. What level of agreement do we expect to see? Here is an example that uses ten pairs, corresponding to the situation where agreement is measured at ten points of variation, and each variation unit has two equally probable states:

2

2

2

1

1

1

2

2

2

1

1

2

2

2

2

1

1

2

2

2

(I used the program "binary.pl", which is listed in the appendix, to generate these sets. The numeral "1" represents a tail, and "2" represents a head.) Here, the agreement of the two "MSS" happens to be 5/10, or 0.5. This figure represents the number of times that the same state occurs in both rows of a column, divided by the number of columns; that is, the ratio is the number of agreements divided by the number of variation units.

[6] Let us pause to consider what has happened. One set of randomly chosen states agrees with the other set of randomly chosen states in fifty percent of cases, even though there is no relationship whatsoever between the two sets. In fact, it is possible for any level of agreement from 0/10 to 10/10 to occur when comparing two randomly chosen sets of states. We understand intuitively that it is more common to get an agreement of, say, 5/10 than to obtain an agreement of 0/10 or 10/10 when the sets are unrelated. Just how often can we expect high levels of agreement between unrelated entities?

The Binomial Distribution

[7] Given the following conditions, the binomial distribution provides the answer we seek:

  1. There is a fixed number of trials or observations

  2. The observations are all independent

  3. Each observation can have one of only two outcomes (i.e. states)

  4. The probability of the first outcome is the same for each observation.

[8] Applying these abstractions to our situation, the number of observations is the number of variation units. The independence condition is satisfied if no variation unit influences the state of any other variation unit in the set. Real variation units can have more than two states, but in our purposely restricted case there are only two. (In fact, about half of the variation units recorded in the United Bible Societies Greek New Testament apparatus satisfy this condition.)

[9] The fourth condition is essential to the validity of results obtained using the binomial distribution; however, it is certainly not satisfied by real variation units. One variation unit might concern presence or absence of an article, while the next might record the tense of a verb that is active in some MSS and subjunctive in the rest. I can choose the first state of the first variation unit to represent "presence of the article", and the first state of the second unit to represent "active": I am free to assign states as I please. Even so, the probability that the article is present and the probability that the verb is active cannot be expected to be the same. Nevertheless, let us continue with our artificial example if only to get a better sense of what agreement is not.

[10] A series of coin tosses fits the four conditions well. I will now try to convince you that the frequency distribution of agreements between two sets of equally probable binary variation units is the same as the frequency distribution of heads that occur in a series of coin tosses. After that, I will examine the frequency distribution of heads, confident that the same distribution applies to agreements between unrelated MSS, subject to the four conditions above.

[11] Take a scribe and give him or her a bag with a huge number of scraps of papyrus inside. (Didn't Origen have a team of female tachygraphers working for him?) Each scrap has written on it either "ναί" or "οὔ", and there are as many "ναίs" as "οὔs". Now, ask the scribe to take ten scraps out of the bag and to write the words down one-by-one in ten boxes drawn on a first piece of parchment. Then, ask the scribe to take out another set of ten, writing the words in ten boxes on a second piece of parchment:

οὔ

ναί

οὔ

οὔ

ναί

οὔ

ναί

ναί

οὔ

οὔ

οὔ

ναί

ναί

οὔ

οὔ

ναί

οὔ

ναί

οὔ

οὔ

(Once again, I used the program "binary.pl" to generate the two sets, substituting "ναί" for "1" and "οὔ" for "2". The experiment must have taken place about the time when parchment was displacing papyrus as the preferred writing material.)

[12] Finally, ask the scribe to take a third piece of parchment and to write in a series of ten boxes "ναί" if corresponding boxes have the same word, and "οὔ" if they don't:

ναί

ναί

οὔ

ναί

οὔ

οὔ

οὔ

ναί

ναί

ναί

[13] Given that the scribe is equally likely to draw ναί or οὔ each time, and that one draw doesn't affect the outcome of the next, corresponding boxes on the first two pieces of parchment are as likely to contain the same word (ναί - ναί or οὔ - οὔ) as to have different words (ναί - οὔ or οὔ - ναί). Therefore, you might just as well ask the scribe to draw out ten pieces of papyrus and to write the results straight onto the third piece of parchment. If you are only interested in how often you get, say, five "ναίs" out of ten over a large number of trials, the last method is as good as the first. Quod erat demonstrandum.

[14] The number of heads thrown in a series of coin tosses conforms to the binomial distribution:

where n is the number of trials (e.g. how many times the coin is tossed), k is the number of successful outcomes (e.g. the number of heads thrown), p is the probability of a successful outcome, and P(k) is the probability that there will be k successes in n trials. This formula incorporates the binomial coefficient:

which employs factorial notation:

[15] The formula may be applied to our example of ten trials, each of which has two equally probable outcomes (i.e. p = 0.5), to obtain a probability for each outcome from zero to ten out of ten heads.

Binomial probabilities for 10 trials (p = 0.5):

P(0): 0.0009765625

P(1): 0.009765625

P(2): 0.0439453125

P(3): 0.1171875

P(4): 0.205078125

P(5): 0.24609375

P(6): 0.205078125

P(7): 0.1171875

P(8): 0.0439453125

P(9): 0.009765625

P(10): 0.0009765625

(These probabilities were calculated using the program "binomial.pl", with the number of trials set to 10 and the probability of a successful outcome set to 0.5.) The nature of the distribution becomes more clear when these numbers are plotted as a histogram:

[16] As suspected, five out of ten heads is far more probable than zero or ten out of ten. Weighting the coin so that it is more likely to favour a head than a tail will skew the distribution:

[17] In the idealised case of equally probable outcomes in each binary variation unit, the first chart gives the expected probability of each level of agreement between a pair of unrelated MSS. The chance of an agreement of, say, seven or more out of ten is P(7) + P(8) + P(9) + P(10), which is 0.172. In other words, there is a 17.2% chance that two completely unrelated MSS will have an agreement level of at least 70%, given that the comparison is performed with ten variation units of this special kind. This is somewhat surprising, especially since agreement of 70% or more is commonly taken to indicate affiliation between a pair of MSS.

The Null Hypothesis

[18] How can relationship be established in the face of random agreements? One approach is to construct two hypotheses:

[19] In the case of a pair of MSS, our two hypotheses would be:

[20] The reference level is set so that for a pair of MSS taken from a population of unrelated MSS, it would only be exceeded through random agreements in a small proportion of cases. If a level of agreement exceeds the reference level then the null hypothesis is rejected and the alternative hypothesis accepted. So we see the devious nature of the statistician's mind. The method of choosing the reference level will be discussed below.

[21] This approach is not infallible, but has two kinds of inherent error. A "type I" error occurs when the null hypothesis is true but is nevertheless rejected. This would occur if a pair of MSS that were actually unrelated had an agreement level exceeding the reference level. A "type II" error occurs when the null hypothesis is accepted even though the alternative hypothesis is true. This would happen if the agreement level of two related MSS was below the reference level.

Choosing the Reference Level

[22] Purely random processes will generate apparent agreements. The frequency of random agreements will conform to some distribution, but for real MSS it will not be the binomial distribution. Although not binomial, the distribution will be comparable to those shown above. In order to establish that there is a relationship between two MSS, a reference level is chosen towards the right side of the distribution such that an agreement level in excess of the reference level will only happen through random processes in a small proportion of cases.

[23] Say that a reference level is chosen so that random processes will produce a greater level of agreement in only 5% of cases. We are then 95% confident that any level of agreement in excess of the reference level is a significant or real effect, as opposed to a mere coincidence. This "confidence level" (not to be confused with "level of agreement") does not have to be 95%. One could choose a higher level of, say, 99%, in which case the reference level would need to be set so that random agreements would only exceed it in 1% of cases. The higher the confidence level, the more stringent the test; type I errors become less frequent while type II errors occur more often. In practice, a confidence level of 95% is quite common for work where the consequences of type I errors are not too severe.

[24] To illustrate, consider again the first binomial distribution plotted above (n = 10, p = 0.5). Given a set of binary variation units with equally probable states, the probability of eight or more agreements is P(8) + P(9) + P(10), which is 0.055 or 5.5%. Therefore, we can be 94.5% confident that an agreement of eight or more out of ten is significant. If we increase the confidence level to 95% then our null hypothesis can only be rejected for agreement levels of nine or ten out of ten.

What Agreement Is: Simplified Case

[25] Once a confidence level is chosen, the binomial distribution can be used to find the minimum level of agreement that is significant for the simplified case of binary variation units with equally probable readings. Given a number of variation units N, the procedure is to sum the binomial probabilities for each level of agreement from zero out of N up to whatever level first causes the sum to exceed the confidence level. The next level in the series is then the required one. Using the previous example, adding P(8) to the sum causes it to exceed 0.95, so the minimum significant level of agreement is nine out of ten. This table provides a few more examples for a confidence level of 95%:

Number of variation units

Minimum significant level

Four or less

None exists

Five

5/5 (100%)

Ten

9/10 (90%)

Fifteen

12/15 (80%)

Twenty

15/20 (75%)

Twenty five

18/25 (72%)

Thirty

20/30 (67%)

(These numbers were calculated by the program "binomial.pl".)

Real Cases

[26] Real variation units such as those reported in the apparatus of a critical text may consist of more than two states. What is more, the states are not equally probable. Consequently, the frequency distribution of agreements does not conform to the binomial distribution. The distribution does exist, but deriving its mathematical formula is difficult because each variation unit has its own set of readings and each reading has its own probability of occurrence.

Monte Carlo Approach

[27] Fortunately, the advent of computers makes it possible to create a distribution that approximates the real distribution for the specific set of variation units under investigation. Firstly, the number of states and the probability of each state is estimated for each variation unit. This information is fed into a Monte Carlo program that, in effect, rolls dice representing the variation units to create a large number of "virtual" MSS. The frequency distribution of agreements expected through purely random agreement is then calculated by comparing the virtual MSS pair-by-pair. Finally, the minimum significant level of agreement is calculated based on a given confidence level.

[28] The estimation step makes use of the information found in a critical apparatus. Hopefully, the recorded number of states and the relative number of witnesses in each state for a particular variation unit are representative of all MSS that cover the relevant section of text.

[29] I will now illustrate the process using the UBS apparatus of the Epistle to the Hebrews. For simplicity, I count every witness provided in the apparatus. I include versions and Church Fathers, and count group categories such as vg as single witnesses. I count individual scribes and correctors such as H* and Hc as separate witnesses, and count minor variations from the main readings as witnesses in support of those readings. Finally, I include separately listed group members such as K and L. This is not the most accurate approach. However, the aim is merely to estimate the respective probabilities of the states in a variation unit. In such an exercise, a stray witness here or there is not of great consequence.

[30] Three states are listed for the variation unit at Heb 1:3. The first (τη̂ς δυνάμεως αὐτου̂, καθαρισμόν) is supported by 23 witnesses. The second state (τη̂ς δυνάμεως, δι' ἑαυτου̂ καθαρισμόν) has six witnesses listed, and the third (τη̂ς δυνάμεως αὐτου̂, δι' ἑαυτου̂ καθαρισμόν) has 37. This tedious counting exercise must now be repeated for each variation unit concerned. The results, for anyone who might be interested, can be found in the appendix.

[31] A Monte Carlo calculation based on the 44 variation units of Hebrews arrives at a value of 32/44 (72.7%) for the minimum significant level of agreement at the 95% confidence level. Using the same number of variation units and the same confidence level, the binomial distribution gives a minimum significant level of 28/44 (63.6%). Thus it seems that real variation units have a greater tendency to random agreement than the artificial ones examined above. (These calculations were performed by the programs "Monte_Carlo.pl" and "binomial.pl".)

Conclusion

[32] Here ends this excursion into the murky world of statistical inference. Armed with a knowledge of what agreement is not, we are better able to recognise genuine affiliation when we see it.


Appendix

Listings of the three programs employed in this article are presented below, along with the variants file used by the Monte Carlo program. In order to use one of the programs, you must first cut, paste and save it using a simple text editor. I have suggested names such as "binary.pl", but you are free to name the programs whatever you like. Once saved, a program can be run from a command prompt provided that a Perl interpreter is present on your computer. This is almost certain to be so if you are using a Unix-like system such as Linux or Mac OSX. A program is run by typing "perl <name>" at the command prompt, where <name> is the program name and the quotation marks are not included. A command prompt can be obtained under Mac OSX by double-clicking the "Terminal" icon located under Applications/Utilities. If the program is not located in the present working directory then you need to specify the full path in the command line.

Listing 1

binary.pl

Listing 2

binomial.pl

Listing 3

Monte_Carlo.pl

When using this program, the "variants.txt" file needs to be in the same directory as "Monte_Carlo.pl".

Listing 4

Variation Units for Hebrews (UBS 4th Ed.)

This is a listing of the "variants.txt" file used by the Monte Carlo program to calculate the minimum significant level of agreement based on the 44 variation units of Hebrews. The required format for each row is:

<label> <TAB> <number> <TAB> <number> ... <TAB> <number>.


Reference List

Aland, Barbara, Kurt Aland, Johannes Karavidopoulos, Carlo M. Martini, and Bruce M. Metzger. 1993. The Greek New Testament. 4th rev. ed. Stuttgart: United Bible Societies.

Moore, David S. and George P. McCabe. 1993. Introduction to the Practice of Statistics. 2nd ed. New York: W. H. Freeman.


Table of Contents


© T. J. Finney, 2002.