Wed May 1 12:56:38 1996

From majordom  Wed May  1 12:56:38 1996
Return-Path: 
Received: by scholar.cc.emory.edu (5.0/SMI-SVR4)
	id AA16824; Wed, 1 May 1996 12:56:38 +0500
Date: Wed, 1 May 96 09:53:00 PDT
From: broman@Np.nosc.mil (Vincent Broman)
Message-Id: <9605011653.AA29311@Np.nosc.mil>
To: tc-list@scholar.cc.emory.edu
In-Reply-To:  (waltzmn@skypoint.com)
Subject: Re: Mathematical methods (Was: Re: Sampling and Vulgate)
Content-Length: 4424
Sender: owner-tc-list@scholar.cc.emory.edu
Precedence: bulk
Reply-To: tc-list@scholar.cc.emory.edu

-----BEGIN PGP SIGNED MESSAGE-----

waltzmn@skypoint.com replied:
>                                            ...First, increasing
> the sample size DOES help, at least somewhat (to argue by analogy --
> if you take a public opinion poll, and ask 90% of the people, you
> don't have to use an unbiased sample; you will still be close. Not
> that I am claiming to sample anything like 90% of the variants).

Elementary counterexample:
Try to estimate the number of homeless by taking a telephone poll,
contacting 95% of the phone subscribers in your town.  Your sample
is huge, but it excludes nearly all the homeless, making the bias
catastrophic.

> Also, we should not mistake the meaning of variance. This is REALLY
> important, folks. I am a mathematician; I KNOW.

Perhaps.

> Everything you ever hear about variance, standard deviation, and
> other measures of "precision" is based on the so-called "Normal
> Distribution" or "Bell Curve."

A variance is defined for any random variable X with known distribution
and finite mean (not just Normal distributions) as

$  Var(X) = E( |X - E(X)|^2 )  $

if this quantity is finite.  I think there is some confusion here
between the $ \sigma^2 $ parameter of Normal distributions
and the Variance defined for general distributions.

> In any case, let's remember what variance and standard deviation
> are meant to measure. They measure the SPREAD of a curve around the
> mean. Thus, once the shape of a curve is established, increasing
> the sample size does NOT significantly reduce the variance.

The whole point of the Weak Law of Large Numbers and of the Central
Limit Theorem (for general distributions) is to show that under
certain conditions certain variances decrease to zero at a certain speed
as you take more and more samples.  Now these conditions require
repeated independent samples from a random variable, and the
New Testament textual tradition is not a random variable,
but we treat it as such as long as we don't observe all the data
out there but only subsets of the data.

If larger samples don't decrease the uncertainty or variance,
why would we ever take more than one sample?

> In any case, we are not measuring a distribution here. In any
> given reading, two manuscripts can do only two things. They can
> agree, or they can disagree. They cannot have "63% agreement," or
> some such. There is NO curve for the variance to measure. Just
> two spikes, with "truth values" 0 and 1.
> 
> Please, folks, let's get our mathematics right.

Yes, please.

Whenever historic data is discussed in terms of probabilities,
we are not saying the past events themselves are still random,
we are describing our uncertainties about the future event
of our encountering the rest of the data.  For example, we examine
the MS in the even-numbered chapters and then predict the
trends we expect to see in the odd-numbered chapters, not yet examined.

When we talk about two MSS agreeing in, say, 58% of the units of variation
listed in Tischendorf, then we mean one of two things, generally.
   1. We have collated all of both MSS and counted the 58% agreement.
or 2. We have compared samples from the MSS and _estimate_ that 58%
      of the sampled and unsampled units of variation would be agreements.
Case 2 involves a model (perhaps unconscious) of the text being a
sequence of Bernoulli trials, each having a probability, p, of showing
an agreement and a probability 1-p of disagreement.  Under certain
conditions, the "best" estimator of this parameter, p, is the ratio
of the number of sample agreements to the number of samples.

> "Totally" independent? No such thing.

Pardon the absolutistic language.
"Independent enough for the present purpose" was meant.
The one who judges the "enough" being the reader, of course.


Vincent Broman,  code 783 Bayside                        Email: broman@nosc.mil
Naval Command Control and Ocean Surveillance Center
Research Development Test and Evaluation Division
San Diego, CA  92152-6147,  USA                          Phone: +1 619 553 1641
=== PGP protected mail preferred.  For public key finger broman@np.nosc.mil ===

-----BEGIN PGP SIGNATURE-----
Version: 2.6.2

iQCVAwUBMYeWa2CU4mTNq7IdAQE7TAP/cj7MTsFE3RLOgXjnqCvWJoQVxHBt6JG4
cqBXBPhv89lZZihEuWc4LaJPrDnnUkYqTP/+Z2L3giCq412ff15UKo33ZcNC5hzn
IG/cQ1lDxRf90NpgfVkpRweWJOJ5kUppHosfKjRhN31nQ3QQ1ObpyrLZANkzPx4R
fX1AOqskH1c=
=lV7c
-----END PGP SIGNATURE-----

Back