Computer methods are increasingly important in textual criticism. We describe and compare two methods of stemma reconstruction: Mink's Genealogical Method (developed for use with the Greek New Testament) and the cladistic maximum parsimony method (developed in evolutionary biology). We use both methods to study a group of Greek texts of the Letter of James that are closely related to the Syriac Harclensis. We show that the methods are fundamentally different in aims and approaches, although there are some points of agreement in the results they produce. The Genealogical Method is most suitable when the priority of each individual reading can be assessed. Maximum parsimony can be used when such assessments are not possible or not desired.
The data used in the analyses are available for download. (See the appendix.)[1] The Greek New Testament is one of the most difficult cases for the textual critic because it contains a large number of manuscripts, spans a long period of time and is known to be heavily contaminated. It therefore presents a challenge to any proposed computer method of stemma reconstruction. Here, we will compare two methods that aim to recover the relationships between manuscript versions of a New Testament work: the Genealogical Method developed by Gerd Mink at the Institut für neutestamentliche Textforschung, and the maximum parsimony algorithm implemented in the software package PAUP (Swofford 2001). The Genealogical Method (Mink 2000) was developed specifically for the Greek New Testament and was used to obtain the primary line text of the Editio Critica Maior (Aland et al. 1997a). Maximum parsimony is one of the most popular phylogenetic methods, and has often been used in the study of manuscript traditions (Platnick and Cameron 1977; Cameron 1987; Lee 1989, 1990; O'Hara and Robinson 1993; Robinson and O'Hara 1996; Robinson 1997; Salemans 2000; Howe et al. 2001). We will begin with the determination of a group of Greek manuscripts based on their texts of the Letter of James. Next, we will explain how a stemmatic representation of the group and its links to the rest of the tradition can be produced by Mink's method and by maximum parsimony. Finally, we will discuss the similarities and differences between the two methods and their consequences for the kinds of stemmata they produce.
[2] In the early 7th century Thomas of Harqel, dismissed Monophysite bishop of Mabbug in Syria, and Paul of Tella lived as exiles in a Coptic monastery near Alexandria. On behalf of Athanasios I, the Monophysite patriarch of Antioch, they worked on a revised Syriac translation of the Bible. Thomas completed the translation of the New Testament, the so-called Harclensis, in 616. In the colophons of the Harclensis, the translator reports on his work (Zuntz 1951; Thomas 1980; Brock 1981, 7-13). He says that he made use of "good" Greek manuscripts revising the Philoxeniana, the work of his predecessor Polycarp, who, on behalf of Philoxenus of Mabbug, had done the first Monophysite Syriac translation of the Bible more than a century before. The revision resulted in a predominantly Byzantine form of text, but the Catholic Epistles are a remarkable exception. In these writings we find a broad stratum of variants differing considerably from the Byzantine text as transmitted since the 9th century (Wachtel 1995, 190-198).
[3] For a correct interpretation of this evidence it is important to take the historical background into account. The revision of the Syriac Bible was part of an attempt to come to an agreement with the Orthodox Church (Zuntz 1945, 7-12; Juckel 1999, 31-33). The Monophysites were accused of having forged the sacred texts systematically, thus fabricating support for their christological doctrine. Thomas' translation is meticulous and often sacrifices the correctness of the Syriac for the sake of a literal rendering of the Greek. This was probably a deliberate attempt to produce a translation as close as possible to the orthodox Greek text. For the same reason it is no surprise that the translation is on the whole a witness to the Byzantine text-type. This makes the divergent textual character of the Catholic Epistles in the Harclensis even more interesting. It is unlikely that Thomas translated a less Byzantine text of these writings. He very probably rendered the Byzantine text of his time throughout the New Testament, but in the Catholic Epistles it had not reached the form which it had in the 9th century. The Harclensis allows us to view an earlier stage of the process that resulted in the Byzantine text.
[4] The task of elucidating the textual affinities of the Harclensis is made easier by the fact that the text of the Catholic Letters that Thomas translated into Syriac is preserved in a group of Greek manuscripts. Working on the Harclensis of the larger Catholic Letters, Barbara Aland applied the tools offered by "Text und Textwert" (Aland et al. 1987) to search for manuscripts that are closely related to the 7th century translation and found a group of 12 minuscules dated 11th - 16th century (Aland and Juckel 1986, 41-90): 206 (London, XIII), 429 (Wolfenbüttel, XIV), 522 (Oxford, 1515), 614 (Milan, XIII), 630 (Rome, XIV), 1292 (Paris, XIII), 1505 (Athos/Lavra, XII), 1611 (Athens, XII), 2138 (Moscow, 1072), 2200 (Elasson, XIV), 2412 (Chicago, XII), 2495 (Sinai, XIV/XV).
[5] Of these manuscripts, 1505, 1611, 2138 and 2495 turned out to be such close textual relatives of the Harclensis that they could be used as guides to the correct wording of a back-translation of the Syriac into Greek (Aland and Juckel 1986, 271-275). Hereafter, this back-translation is referred to as 'H'.
[6] The analyses in "Text und Textwert" are based on collations of all available Greek manuscripts at certain test passages, 98 of which refer to the Catholic Letters. The results from an investigation of the Harclensis text at these 98 passages were confirmed by a quantitative analysis of full collations of the manuscripts that were included in the James volume of the Editio Critica Maior (Aland et al. 1997a). Moreover two additional core members of the Harclensis group (hereafter 'HG') were found: 1890 (Jerusalem, XIV) and 1799 (Princeton, XII/XIII).
[7] 'Quantitative analysis' here basically means determining the relationship between manuscript texts in terms of relative degrees of agreement and difference. The apparatus of the James volume of the ECM displays full evidence for 164 manuscripts at 761 passages where the tradition of the Greek text is at variance, excluding merely orthographical differences. We tabulated the percentages of agreement of each manuscript with each other so that for each manuscript we could sort all others according to their degree of agreement. The list for the 12th century HG minuscule 1505 may serve as an example (Table 1).
[8] A necessary criterion for two manuscripts to be members of the same group is a degree of agreement that is higher than the degree of agreement with the mainstream of the tradition as defined by typical representatives of the Byzantine text type (see Aland et al. 1997b, B8-B9). This criterion is necessary because at any of the 761 variant passages of the Letter of James, most manuscripts witness to the majority text. Consequently a quantitative group definition that would not apply the criterion of difference from the majority text would have to deal with many group members which could be seen as members of other groupings as well, thus depriving the groupings of any ability to reveal structures within the tradition.
[9] 1505 shares 88.5% of its readings with the majority of all witnesses.1 Manuscripts which share less than 88.5% of readings with 1505 can therefore safely be excluded from any group of close relatives of 1505. In Table 1, 1505 agrees more with the majority than it does with all the manuscripts below 1832, so all the manuscripts below 1832 can be excluded. But, as can be seen from the entry in the last column, the share of majority readings in 1832 is considerably higher than that of agreements with 1505. The results will certainly be more informative if the percentage of agreement with the majority text ('Majority' column in Table 1) is used to sharpen the group definition. In the case of 1505 all manuscripts more distant than 1852 would have to be excluded from a group of close relatives, because the manuscripts below 1852 in Table 1 (with the exception of the small fragment P20) agree more often with the majority than with 1505.
[10] The highest percentage of agreement with 1505 is reached by the majuscule fragment 0166 but, as the manuscripts have only a total of 8 passages in common, this does not say much. Quantitative analyses do not lead to informative results with small fractions of texts. The first approximately complete text in Table 1 is 2495, a manuscript almost identical to 1505, textually speaking. Moving down from 2495 to 1852, one finds all members of HG as listed above. Among them there are more fragments (0246, 2718S, P23), but also complete manuscripts that do not figure as core members of HG: 1448, 2652, 1852. These manuscripts are not regarded as core members because they do not appear among the close relatives of all HG manuscripts. Core members mutually figure in the group of close relatives according to the stricter criterion of difference from the majority text.
[11] Table 3 shows the degrees of agreement among the HG core members. In general, group members agree with each other more than with the majority text. H can only be assigned a single unambiguous reading at 495 out of 761 passages because in many places there is more than one Greek variant that could be the source for the Syriac. In the ECM, 'S:H' is cited with a double arrow in such places. They were treated as lacunae for the present evaluation. H has 88.4% majority readings, which is at about the same level as in most other group members. Like the manuscripts, H agrees more frequently with all other HG members than with the majority text. The manuscripts, too, have higher percentages of agreement with H than with the majority. Although the percentages of agreement with other HG manuscripts never reach values as high as that of 1505/2495 (98.9%) or 614/2412 (99.2%), H can safely be regarded as a 15th core member of HG.
[12] So far we have established a group of closely related texts, in 14 Greek manuscripts and the Harclensis, solely by counting their agreements and differences. To learn more about the structure and the affinities of this group it is now necessary to turn to textual changes.
[13] In the praefationes of scholarly editions of antique literature, a stemma may be used to show the editor's view of the history of the text edited as the best possible approximation to the original. The editor arrives at his view by studying textual variants as recorded in the critical apparatus. As a rule, some particularly instructive examples are cited to show the direction of change which inevitably results from manual copying. Thus it is a well established philological principle to infer the relationships between manuscripts, or rather copies of a text, from variant readings. However, it is traditionally applied only to a selection of readings seen as particularly significant.
[14] The Genealogical Method as developed by Gerd Mink (Mink 1993; Aland et al. 2000, 23*-24*; Mink 2000) is distinguished from conventional attempts to analyse and describe the history of a text by four features:
[15] The basic principle of the Genealogical Method is to infer the genealogy of states of a text from genealogical assessments of readings at every variant passage of that text. If we state, for example, that reading x is probably the source of reading y, this implies a statement about the relationship of the copies containing the readings. Having subjected every variation unit of a text to an assessment of the local genealogy of its readings, we can summarise the assessments and represent the resulting values by arrows (directed edges) in a stemma of manuscript texts.
[16] Although local genealogy of readings is the starting point of the Genealogical Method and its practical usefulness is in providing new means for the assessment of variants, we shall not deal with these in the present article because that for the most part would mean describing philological procedures or writing a textual commentary. However, a further feature of Mink's method has to be mentioned in this regard: the analysis of coherence in the attestation of readings. Two modes of coherence, pre-genealogical and genealogical, are to be distinguished. Pre-genealogical coherence is inferred from the degree of similarity of manuscript texts. For the assessment of variants and their relations it is useful to look at the closest relatives of a witness. They will normally attest the same reading, but where they do not, the pre-genealogical coherence of witnesses of different readings points to a genealogical coherence of those readings. If, for example, one cannot decide on philological grounds whether the source of a reading c was a or b, it is best to opt for the reading that has close relatives of the witness(es) of c among its attestation. Such an observation may help to establish genealogical coherence. If we see, for example, that a witness z of reading c is more often dependent on a witness x of reading b, it is more likely that reading c is dependent on reading b than vice versa. 3
[17] The following exposition presupposes an assessment of local genealogy after analyses of coherence at each variant passage in the James ECM.
[18] The predominant relations of the HG manuscripts are expressed by arrows in Figure 2. Table 4 contains the data which give the arrows their direction. For each manuscript shown in the stemma, the 'ms 2' column of Table 4 lists potential ancestors,4 sorted in descending order of agreement. In Figure 2, the first entry for each manuscript in Table 4 is represented by an arrow. Clicking on the link to Figure 3 leads to a stemma that additionally takes the second entry for each manuscript into account, and Figure 4 finally shows all three possible ancestors.5
[19] In Figure 2, there are two branches emanating from 1448, a form of text separated from the supposed original (A) by only one intermediary state (1852). This is explained in Table 4, which shows 1448 as the closest possible ancestor of both 429 and 1611. 429 is in agreement with 1448 at 716 of 760 passages. At 25 passages 429 attests to a variant that was assessed to be derived from that found in 1448 (1<<2), while the text is likely to have been changed in the opposite direction in only 13 cases (1>>2). The degree of agreement is about the same with 1611 compared to 1448, but the direction of change is more clear (27:11).
[20] Table 4 shows what kind of relationship is represented by an arrow between two manuscript numbers. Such manuscripts are rather similar. The degree of agreement is the first sorting criterion. The second is the number of readings in ms 1 that were assessed to be derived from those found in the related ms 2 (column '1<<2'), compared to the number of readings in ms 2 that were assessed to be derived from those in ms 1. These figures are decisive for the direction of the corresponding arrow. But for interpreting the stemma, it is essential to be well aware of both figures because there are always readings in the manuscript from which the arrow starts that are assessed to be derived from the manuscript to which the arrow points. The difference may be slight or may not exist at all. The difference is just one, for example, with the second possible ancestor of 429, the Byzantine manuscript 35. If we had seen the reading of 429 as the source of that in 35 at only one more passage, 429 would range among the possible ancestors of 35. The relationship between 2200 and 1611 is represented by two edges without direction in Figure 4, because the corresponding values are equal.
[21] Table 4 is an extract of a full list of possible ancestors and descendants of all manuscripts included in the ECM James. Table 5 contains the first 50 entries of the full list. The same table may be accessed from Table 4 by clicking on an underlined manuscript number. If we look at the section relating to 429 again, we see the reason for its high and central position in the stemma. All other HG manuscripts are pointed to from 429. Skipping the fragments 0166 and 0246, we have to pass ten more closely related descendants until we reach the first possible ancestor, 1448. 429 got its predominant stratum of readings from the state of text that, via 1852, connects HG with the supposed original. The second possible ancestor, 35, is separated from the first by eight positions, and it is particularly interesting that this and the third possible ancestor, 2080, are straightforward Byzantine states of text. Both lines of tradition meet in a state of text that seems to come closest to the archetype of HG.
[22] To see the outstanding quality of 429 one may also compare the next manuscript in the list, 522, that shows a majority of priority readings with only one state of text among the first 50 positions of the full list of descendants and ancestors.
[23] It comes as a surprise that the re-translated Harclensis (H) appears in a terminal position as a descendant of 1505, 1611, and 1448. The proportions of 13:11 with 1505, and 15:12 with 1611, are not very significant but the priority of 1448 with 17:10 is clear enough. A higher position would fit the early date of the Harclensis better. Although we have to be aware that far less evidence could be evaluated for the Harclensis than for the Greek manuscripts (only 495 passages where 1505 and H could be assigned to single unambiguous variants), we may conclude with due caution that the form of text preserved in the Greek witnesses is prior to that translated into Syriac in 616 A.D.
[24] However, Figure 4 shows only the three most similar possible ancestors of H, whether or not they are all necessary for explaining the text of H (cf. note 5). The final substemma for H would show 1505 and 1852 (five positions below 1448 in Table 5) as the only ancestors necessary for explaining the text of H. This means that the relations represented by the edges 1611-H and 1448-H do not contribute information beyond that represented by the edge 1505-H. However, four readings in H are not equal to or derived from those in 1505: 1:24:6 a; 2:3:12-18 a; 2:5:2-8 a; 4:14:22-26 a (variant passages given in the form used by ECM, Aland et al. 1997a, 17*). H shares these readings with 1852.
[25] A character is a location (either a single word or a group of words) in the text at which variation may occur. A state is a single reading at a character, or a group of readings that are not distinguished for stemmatological purposes (for example, we would usually count a set of readings differing only in punctuation as a single state). A binary character is a character for which only two different states are present among all the extant manuscripts, and a multistate character is one for which there are more than two different states among all extant manuscripts. A stemma is a diagram (usually a tree but sometimes some other kind of graph) showing hypothetical genealogical relationships among manuscripts. The stemma contains nodes which are real or hypothetical manuscripts (or sometimes groups of manuscripts), and edges which connect nodes along lines of descent. Like Quentin and Greg (Metzger 1992, 163-166), we make no attempt to identify original readings. Thus, we do not attempt to infer the direction of textual flow along edges. This is a problem that must be tackled separately, and does not affect the reconstruction of the topology of the stemma. The degree of a node is the number of edges emerging from it. Internal nodes are nodes within the stemma of degree greater than one (i.e. points where edges join). Terminal nodes, at the tips of branches, are nodes of degree one. The length of an edge can be represented as proportional to the number of changes in readings occurring between the nodes it connects. A maximally parsimonious stemma is a stemma requiring the smallest possible number of independent changes in order to generate the observed states in extant manuscripts. A parsimony-informative character has at least two states shared by at least two manuscripts each, and therefore contains useful information about the relationships between manuscripts. A parsimony-uninformative character has only one state shared by more than one manuscript, and therefore cannot provide information about the topology of the stemma (see Table 6 for examples). Homoplasy is the presence of shared readings as a result of convergent changes, contamination or state reversals in different lineages, rather than common ancestry in the dominant pathway of textual flow.
[26] In the complete Letter of James data set (Mink 2000), 60 characters out of 761 are constant, 266 are variable but parsimony-uninformative, and 435 are variable and parsimony-informative. The constant characters are listed in the database because there is patristic or versional evidence for variants at these points, although they are not attested in any extant Greek New Testament manuscript. We wrote MATLAB 6 code (The Mathworks, Inc., Natick, MA) to convert the data from the matrix format used for preparing the ECM James to a suitable format for the analyses described below.
[27] Maximum parsimony ('MP') aims to find the tree requiring the smallest possible number of changes of reading consistent with the observed readings in extant manuscripts. This is the principle used by most traditional stemmatological approaches (Platnick and Cameron 1977; Cameron 1987; Lee 1989). We assume that all extant manuscripts fall on terminal nodes and that all internal nodes are hypothetical manuscripts with degree three. Cases in which an extant manuscript is a descendant of another extant manuscript (extant manuscripts are internal nodes on the true stemma), or in which several extant manuscripts are directly descended from the same exemplar (internal nodes have degree greater than three) can be accommodated by allowing edges of length zero. We describe the maximum parsimony method in two stages: evaluating the number of changes needed on a given tree, and finding the best possible tree.
[28] In order to find the number of changes required on a given tree, the readings at hypothetical manuscripts are reconstructed using the following algorithm (Page and Holmes 1998, 163-166):
The aim of these assignment rules is to have the smallest possible number of independent changes of reading on the given tree. Ambiguities in which internal nodes have more than one state will later be resolved as described below.
[29] The result will be a tree on which all nodes have sets of states assigned for each character. Ambiguities may remain where some nodes have more than one possible state for some characters. These ambiguities can be resolved in several different ways (Swofford and Maddison 1987). The accelerated transformation (ACCTRAN) algorithm assumes that state changes occur as soon as possible as we move towards the terminal nodes of a tree, maximizing the proportion of homoplasy accounted for by reversals of earlier changes. The delayed transformation (DELTRAN) algorithm assumes that state changes occur as late as possible as we move towards the terminal nodes, maximizing the proportion of homoplasy accounted for by parallel changes in different branches. The minimum F-value (MINF) algorithm attempts to choose among ambiguous states so as to fit the lengths of reconstructed paths between extant manuscripts on the tree as closely as possible to the observed numbers of character differences. The ACCTRAN and DELTRAN methods are sensitive to the location of the root (archetype), while the MINF algorithm is not (Swofford and Maddison 1987). We will therefore use MINF reconstructions here, although the results from the other methods are similar. The set of maximally parsimonious trees and the lengths of all trees are the same whichever reconstruction method is used.
[30] This reconstruction algorithm is fundamentally different from other approaches such as the Profile Method (Wisse 1982, 40) because reconstructed readings are explicitly derived from a stemma rather than from the majority reading in a group of manuscripts. For example, the MP reconstruction of the ancestor of a group of manuscripts will not necessarily contain readings shared by the majority of those manuscripts. Instead, it will contain the readings that require the smallest number of changes over the whole stemma.
[31] Once the readings at hypothetical manuscripts have been reconstructed, the length of the tree is defined as the total number of changes of reading occurring on the tree. If changes of reading are relatively rare, the shortest trees are most likely to be correct. (In a large and complex text tradition, there may be many different equally short trees.) Figure 5 shows the three possible arrangements of a tree for four manuscripts, with readings at internal nodes reconstructed using the MINF algorithm. We can be certain in this case that tree (A) is the most parsimonious for these manuscripts. However, no method other than calculating the length of all possible trees is guaranteed to find the set of all shortest trees (Hwang et al. 1992, 14-16, 310-313). This is rarely practical because the number of possible trees increases very rapidly with the number of manuscripts (Flight 1990). Instead, programmes such as PAUP (Swofford 2001) use heuristic searches which, though not guaranteed to find the best tree, usually perform well in practice. We wrote MATLAB code to generate a NEXUS file (Maddison et al. 1997) readable by PAUP, then used version 4.0b10 of PAUP to search for maximally parsimonious stemmata, with an upper limit of 10000 stemmata due to memory constraints. Table 6 illustrates the format of a simple NEXUS file.
[32] If there are many maximally parsimonious stemmata, it would be misleading to present a single example and impractical to present them all. Consensus methods attempt to combine information from all the rival stemmata (Swofford 1991). A strict consensus contains only those groupings that are present in all of the maximally parsimonious stemmata. However, an Adams-2 consensus (Adams III 1972; Adams III 1986) is often more informative. The Adams-2 consensus contains any 'taxonomic statement' shared by all the rival trees. For example, if a pair of manuscripts (extant or hypothetical) appear as part of a group in every rival tree, they will be grouped together on the Adams-2 consensus even if the other members of the group may differ between rival trees (Swofford 1991). In all consensus methods, all areas of disagreement between rival stemmata (unresolved parts of the consensus) are represented as internal nodes of degree greater than three (e.g. Figure 7c in Salemans 1996).
[33] Consistency indices measure the extent to which characters agree with a given stemma. If the stemma is correct, the consistency index measures the extent to which a character contains information about the transmission of the text. We use the ensemble rescaled consistency index ('RC') as a summary measure of overall fit to the stemma (Farris 1989; Swofford 1991). This ranges from 0 (all differences in readings are due to reversals or parallel changes) to 1 (no reversals or parallel changes are needed to account for the observed distribution of readings). However, the chance of discovering homoplasy increases as the number of manuscripts increases (Swofford 1991), so comparisons between traditions with different numbers of manuscripts (e.g. Salemans 2000, 46, note 30) are not valid.
[34] We carried out some further analyses to check the assumptions of the maximum parsimony method. We used a compatibility matrix (Sneath et al. 1975; Jakobsen and Easteal 1996) to search for evidence of changes of manuscript affiliations within the text of James. We found no evidence for such changes. This is what we would expect, given that the Letter of James is such a short passage. However, this analysis suggested that there may have been many different pathways of textual flow. Elsewhere (M. Spencer et al., in preparation), we describe network methods designed to deal with stemmatic reconstruction in such cases.
[35] We found 10000 maximally parsimonious stemmata for the Letter of James data, and there are probably many more (memory constraints limited our search to 10000). Thus, although there may have been a single dominant pathway of textual flow from the archetype to each extant manuscript, there were probably many other less important lines of transmission. The Adams-2 consensus of these 10000 stemmata is shown in Figure 6. Only 37 out of 165 manuscripts were directly connected to nodes of degree more than three. This means that all the stemmata agreed on how most of the manuscripts should be arranged into small related groups. For example, all the Harclensis group manuscripts (labelled in red on Figure 6) other than 1831 and 1490 formed a single group containing no other manuscripts except the back-translation of the Syriac Harclensis into Greek. Another known group contains states of text that are thought to be important for the formation of the Byzantine text: 94, 307, 180, 424, 453, 468, 720, 918, 1678, 1840, 2186, 2197, and 2818 (G. Mink and K. Wachtel, unpublished analyses). All but one (424) of these (labelled in blue on Figure 6) form part of a single group in the MP stemma. However, there were many other internal nodes of degree greater than three, so we cannot be certain how these groups should be fitted together to form a complete stemma. Our inability to reconstruct these deeper relationships reflects the fact that only a small fraction of manuscripts contain text forms older than that of the late medieval majority.
[36] The ensemble RC was 0.19. Thus, most of the variation in character states among manuscripts was due to homoplasy. As a rule, agreement in readings does not necessarily reflect common ancestry along a dominant pathway of textual flow in this stemma. This may be because of extensive contamination.
[37] The initial consensus tree (Figure 6) had an unresolved branching structure at the base of the Harclensis group (1448, the Syriac Harclensis back-translation and the common ancestor of the rest of the group all emerge from the same node), which means we cannot reconstruct the readings at this point. However, 99.89% of the stemmata we found grouped 1852 with the Syriac Harclensis back-translation (the dominant topology: Figure 7). The other 0.11% grouped 1611 with the Syriac Harclensis back-translation. In order to reconstruct the ancestral state for the Harclensis group, we therefore carried out a second maximum parsimony analysis with the topology of this group constrained to be the dominant form shown in Figure 7. All other parts of the Harclensis group stemma were resolved in the original consensus tree, which implies that there was a single dominant pathway of textual flow for this group (although there may have been other, less important pathways as well). All the manuscripts in the Harclensis group date from no earlier than the 11th century (Aland et al. 1997b, B5). We therefore assume that the archetype of the whole tradition is not part of the Harclensis group. If this is true, the core members of the Harclensis group have the internal node labelled 'c' on Figure 7 as their most recent common ancestor.
[38] There are 34 changes of reading separating the most recent common ancestor of the core Harclensis group (node c on Figure 7) from the rest of the tradition. 23 changes (Table 7) occur between nodes a and b on Figure 7, where node a is the closest internal node to the Harclensis group from which no manuscript in this group is immediately descended, and node b is the ancestor of the core Harclensis group manuscripts and 1852. Another 11 changes (Table 8) occur between nodes b and c.
[39] There are some superficial similarities between the Genealogical Method (GM) and maximum parsimony (MP). Both work with single passages of variation rather than a summary matrix of the distances between each pair of manuscripts, and use the principle of Ockham's razor in choosing between alternative stemmata by minimizing the number of changes of reading.
[40] However, the two methods are fundamentally different in their aims and in the ways they achieve these aims. A GM stemma is a summary of philological assessments of individual readings, offering a visualization of a hypothesis about textual relationships and a check on the consistency of philological work. GM attempts to produce optimal substemmata without introducing hypothetical manuscripts. The substemmata are then combined into an overall stemma while minimizing the number of edges. Extant texts may be represented as the ancestors of other extant texts. Decisions about original readings and the direction of textual flow are required in the production of the stemma. The direction of textual flow is determined on the basis of all readings, including singular readings (those that are found in only one extant witness). As Figure 1 makes clear, the stemma does not attempt to represent the actual copying relationships among manuscripts. Instead, it represents the relationships among the readings found in the texts in a way that does not necessarily reflect the physical transmission of those readings. In cases where a text is the product of contamination, it may be represented as having several ancestors.
[41] MP attempts to minimize the number of changes of reading over the whole of the single branching tree which best describes the dominant pathway of copying. Because MP introduces hypothetical ancestral manuscripts, a maximum parsimony tree will usually require fewer (and will never require more) independent changes of reading than a stemma produced by GM (Hwang et al. 1992, 37-38, 216). Extant manuscripts are represented only as terminal nodes, although cases where one extant manuscript is an ancestor of another may be represented by edges of length zero. For example, the MP stemma for the Harclensis Group in Figure 7 has an edge of length zero separating 1505 from the common ancestor of 1505 and 2495, suggesting that 2495 is in reality a descendant of 1505. The readings present in hypothetical manuscripts can be explicitly reconstructed, providing hypotheses about the states of ancestral texts. The changes occurring along edges that separate a group of manuscripts from the rest of the stemma (e.g. Tables 7 and 8) might sometimes identify readings characteristic of the group. However, some of these readings might change again elsewhere in the group, especially if they would have appeared unusual to medieval scribes (Wisse 1982, 35).
[42] MP makes no attempt to identify original readings and does not require genealogical information based on philological work, although it can use such information if available. We chose not to use this information in our comparative study, because the only available genealogical information was derived from GM and would therefore prevent us from making an independent comparison of the two methods. Singular readings can only tell us that a given manuscript is different from all others. Thus, singular readings do not affect the topology of an MP stemma, although they do affect the lengths of edges. Unlike GM, maximum parsimony assumes that each manuscript has only one immediate ancestor. Any shared readings that cannot be attributed to common ancestry on this tree are assumed to be the result of convergent change or contamination. In a heavily contaminated tradition, there may be many alternative stemmata that are equally parsimonious. Elsewhere (M. Spencer et al., in preparation), we describe the application of reduced median networks (a parsimony-based method that explicitly deals with contamination) to the Greek New Testament.
[43] Given these differences, it is difficult to make meaningful detailed comparisons of stemmata from GM and MP. For example, our GM analysis of the Harclensis Group placed H far from the base of the group, even though we know that it is earlier than most of the other manuscripts. This reflects the fact that the extant Greek witnesses in the Harclensis Group may preserve a form of the text of James less different from the original form than that used to produce H. However, our MP analysis placed H near the base of the group, suggesting that H and the extant Greek witnesses in the Harclensis Group represent separate lines of descent from a common ancestor. This highlights the different purposes for which MP and GM were developed.
[44] However, there are some agreements between the stemmata for the Harclensis group produced by MP (Figure 7) and GM (Figure 2). In both cases, 1852 and 1448 are located at the base of the group. The pairs {1505, 2495}, {1890, 2138} and {614, 2412} appear in both stemmata, as part of the same subgroup of manuscripts. The pairs {206, 1799} and {630, 2200} also appear in both stemmata, as part of another subgroup which also contains 429 and 522. There are some differences (other than the location of H): 1292 is in a different subgroup in the two stemmata, and unlike the GM stemma, the MP stemma does not have a large number of manuscripts descended from 429 (or a close relative of it).
[45] In summary, despite some superficial similarities, the Genealogical Method and the cladistic method of maximum parsimony are fundamentally different approaches to the construction of a stemma. The stemmata they produce are intended to represent different aspects of the text tradition, and cannot directly be compared. The Genealogical Method will be most useful in situations where a large amount of philological work can be done on each reading, including assessments of priority. In contrast, maximum parsimony is suitable for cases in which it is difficult or impossible to determine the priority of readings in advance.
[46] This study is the result of a cooperation between the STEMMA project (Studies on Textual Evolution of Manuscripts by Mathematical Analysis) and INTF (Institut für neutestamentliche Textforschung), promoted by a travel grant from ARC and DAAD. The STEMMA project is supported by the Leverhulme Trust. We are very grateful to Gerd Mink for productive discussions and explanations, and also to Adrian Barbrook, Barbara Bordalejo, Andreas Dress, Peter Forster, Pete Lockhart, Linne Mooney, Bruce Morrill, David Penny and Peter Robinson. David Parker commented on an earlier draft of this manuscript. Hamish Symington helped with the figures.
The data used in the analyses are available for download from here.
This appendix was added shortly after publication of the original article. -Ed.1 Table 2 shows that this is a relatively low value. Note that Codex Vaticanus agrees with the majority text at 87.9% of the variant passages; even A, the edited text, agrees at 91.2% of the passages.
2 Thus it would be consistent to never say "manuscript" when "state of text" would be more precise. Yet it would be awkward, as both text and manuscript are marked by the manuscript number. It may suffice to define "manuscript" to mean "manuscript text" in the context of the Genealogical Method.
3 For definitions and explanations of both modes of coherence see Mink (2000, 2003a).
4 The qualification "potential" is used because we will not, as a rule, happen on the true vorlage of a manuscript. Yet a state of text regarded as a potential ancestor contains a layer of variants that are necessary for a partial reconstruction of the vorlage.
5 In this article we restricted ourselves to assigning to each member of the Harclensis group three possible ancestors that are most similar to it, regardless of whether they are all necessary for explaining the text of the descendant as being equal to or derived from ancestral states. Thus an important stemmatological feature of Mink's method will not be dealt with here: a final stemma will contain only those edges that are necessary to explain the state of the text, reading by reading, in every manuscript represented by a node. Among other principal points of the Genealogical Method, this will be discussed in Mink (2003b).
Adams III, E. N. 1972. Consensus techniques and the comparison of taxonomic trees. Systematic Zoology 21:390-7.
Adams III, E. N. 1986. N-trees as nestings: Complexity, similarity, and consensus. Journal of Classification 3:299-317.
Aland, B., K. Aland, G. Mink and K. Wachtel. 1997a. Novum Testamentum Graecum Editio Critica Maior. IV. Catholic Letters. Part 1. Text. Installment 1. James. Stuttgart: Deutsche Bibelgesellschaft.
Aland, B., K. Aland, G. Mink and K. Wachtel. 1997b. Novum Testamentum Graecum Editio Critica Maior. IV. Catholic Letters. Part 2. Supplementary Material. Installment 1. James. Stuttgart: Deutsche Bibelgesellschaft.
Aland, B., K. Aland, G. Mink and K. Wachtel. 2000. Novum Testamentum Graecum Editio Critica Maior. IV. Catholic Letters. Installment 2. The Letters of Peter. Stuttgart: Deutsche Bibelgesellschaft.
Aland, B., and A. Juckel. 1986. Das Neue Testament in syrischer Überlieferung. Bd. I: Die großen Katholischen Briefe. Berlin / New York: de Gruyter.
Aland, K., A. Benduhn-Mertz, G. Mink and H. Bachmann. 1987. Text und Textwert der griechischen Handschriften des Neuen Testaments. Bd. I: Die Katholischen Briefe. Berlin / New York: de Gruyter.
Brock, S. 1981. The resolution of the Philoxenian / Harclean problem. In New Testament textual criticism: Essays in honour of Bruce M. Metzger, ed. E. J. Epp and G. D. Fee, 325-43. Oxford: Clarendon.
Cameron, H. D. 1987. The upside-down cladogram: Problems in manuscript affiliation. In Biological metaphor and cladistic classification: An interdisciplinary perspective, ed. H. M. Hoenigswald and L. F. Wiener, 227-42. London: Frances Pinter.
Farris, J. S. 1989. The retention index and the rescaled consistency index. Cladistics 5:417-9.
Flight, C. 1990. How many stemmata? Manuscripta 34:122-8.
Howe, C. J., A. C. Barbrook, M. Spencer, P. Robinson, B. Bordalejo and L. R. Mooney. 2001. Manuscript evolution. Trends in Genetics 17:147-52.
Hwang, F. K., D. S. Richards and P. Winter. 1992. The Steiner tree problem. Amsterdam: North-Holland.
Jakobsen, I. B., and S. Easteal. 1996. A program for calculating and displaying compatibility matrices as an aid in determining reticulate evolution in molecular sequences. Computer Applications in the Biosciences 12:291-5.
Juckel, A. 1999. Die Bedeutung des Ms Vat. syr. 268 für die Evangelien-Überlieferung der Harklensis. Oriens Christianus 83:22-45.
Lee, A. R. 1989. Numerical taxonomy revisited: John Griffith, cladistic analysis and St. Augustine's Quaestiones in Heptateuchem. Studia Patristica 20:24-32.
Lee, A. R. 1990. BLUDGEON: A blunt instrument for the analysis of contamination in textual traditions. In Computers in Literary and Linguistic Research, ed. Y. Choueka, 261-92. Paris: Champion-Slatkine.
Maddison, D. R., D. L. Swofford and W. P. Maddison. 1997. NEXUS: An extensible file format for systematic information. Systematic Biology 46:590-621.
Metzger, B. M. 1992. The text of the New Testament: Its transmission, corruption, and restoration. New York: Oxford University Press.
Mink, G. 1993. Eine umfassende Genealogie der neutestamentlichen Überlieferung. New Testament Studies 39:481-99.
Mink, G. 2000. Editing and genealogical studies: The New Testament. Literary and Linguistic Computing 15:51-6.
Mink, G. 2003a. "Was verändert sich in der Textkritik durch die Beachtung genealogischer Kohärenz?" In New Developments in Textual Criticism: New Testament, Early-Christian and Jewish Literature, ed. W. Weren and D.-A. Koch (forthcoming). Assen: Royal Van Gorcum.
Mink, G. 2003b. Problems of a highly contaminated tradition: The New Testament. In Studies in Stemmatology II, ed. P. van Reenen et al. (forthcoming).
O'Hara, R., and P. Robinson. 1993. Computer-assisted methods of stemmatic analysis. In The Canterbury Tales Project Occasional Papers Volume I, ed. N. Blake and P. Robinson, 53-74. London: Office for Humanities Communication Publications.
Page, R. D. M., and E. C. Holmes. 1998. Molecular evolution: A phylogenetic approach. Oxford: Blackwell Science.
Platnick, N. I., and H. D. Cameron. 1977. Cladistic methods in textual, linguistic, and phylogenetic analysis. Systematic Zoology 26:380-5.
Robinson, P. 1997. A stemmatic analysis of the fifteenth-century witnesses to The Wife of Bath's Prologue. In The Canterbury Tales Project Occasional Papers Volume II, ed. N. Blake and P. Robinson, 69-132. London: Office for Humanities Communication Publications.
Robinson, P. M. W., and R. J. O'Hara. 1996. Cladistic analysis of an Old Norse manuscript tradition. In Research in Humanities Computing 4, ed. S. Hockey and N. Ide, 115-37. Oxford: Oxford University Press.
Salemans, B. J. P. 1996. Cladistics or the resurrection of the method of Lachmann: On building the stemma of Yvain. In Studies in Stemmatology, ed. P. van Reenen and M. van Mulken, 3-70. Amsterdam: John Benjamins Publishing Company.
Salemans, B. J. P. 2000. Building stemmas with the computer in a cladistic, neo-Lachmannian, way. Ph.D. diss., Katholieke Universiteit Nijmegen.
Sneath, P. H. A., M. J. Sackin and R. P. Ambler. 1975. Detecting evolutionary incompatibilities from protein sequences. Systematic Zoology 24:311-32.
Swofford, D. L. 1991. When are phylogeny estimates from molecular and morphological data incongruent? In Phylogenetic Analysis of DNA Sequences, ed. M. M. Miyamoto and J. Cracraft, 295-333. Oxford: Oxford University Press.
Swofford, D. L. 2001. PAUP*. Phylogenetic Analysis Using Parsimony (*and other methods). Version 4.0b10. Sunderland, MA: Sinauer Associates.
Swofford, D. L., and W. P. Maddison. 1987. Reconstructing ancestral character states under Wagner parsimony. Mathematical Biosciences 87:199-229.
Thomas, J. D. 1980. The gospel colophon of the Harclean Syriac version. Theological Review, The Near East School of Theology (Beirut) 3:16-26.
Wachtel, K. 1995. Der Byzantinische Text der Katholischen Briefe: Eine Untersuchung zur Entstehung der Koine des Neuen Testaments. Berlin / New York: de Gruyter.
Wisse, F. 1982. The profile method for the classification and evaluation of manuscript evidence as applied to the continuous Greek text of the Gospel of Luke. Grand Rapids, MI: William B. Eerdmans Publishing Company.
Zuntz, G. 1945. The ancestry of the Harklean New Testament. London: Oxford University Press.
Zuntz, G. 1951. Die Subscriptionen der Syra Harclensis. Zeitschrift der Morgenländischen Gesellschaft 101:174-96.