Wed May 1 08:10:53 1996

From majordom  Wed May  1 08:10:53 1996
Return-Path: 
Received: by scholar.cc.emory.edu (5.0/SMI-SVR4)
	id AA15675; Wed, 1 May 1996 08:10:53 +0500
Date: Wed, 1 May 96 07:07 CDT
X-Sender: waltzmn@popmail.skypoint.com
Message-Id: 
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
To: tc-list@scholar.cc.emory.edu
From: waltzmn@skypoint.com (Robert B. Waltz)
Subject: Mathematical methods (Was: Re: Sampling and Vulgate)
Content-Length: 2744
Sender: owner-tc-list@scholar.cc.emory.edu
Precedence: bulk
Reply-To: tc-list@scholar.cc.emory.edu

On Tue, 30 Apr 96, broman@Np.nosc.mil (Vincent Broman)

>
>If your sampling method is biased, increasing the sample size won't help,
>the larger sample will still be biased, it just has a smaller variance.

I must disagree with both halves of this statement. First, increasing
the sample size DOES help, at least somewhat (to argue by analogy --
if you take a public opinion poll, and ask 90% of the people, you
don't have to use an unbiased sample; you will still be close. Not
that I am claiming to sample anything like 90% of the variants).

Also, we should not mistake the meaning of variance. This is REALLY
important, folks. I am a mathematician; I KNOW.

Everything you ever hear about variance, standard deviation, and
other measures of "precision" is based on the so-called "Normal
Distribution" or "Bell Curve."

But not everything follows normal distributions. I could cite
quite a few examples if I went to my math library (stellar
radiation patterns spring quickly to mind).

There is no reason to believe that textual relationships follow
a normal distribution. It may be so, but the fact needs to be
proved. (And until it is, you cannot draw ANY conclusions from
variance.) There is, in fact, reason to believe that the distribution
is NOT normal, but instead has a bunch of spikes. We call the spikes
"Text-types."

In any case, let's remember what variance and standard deviation
are meant to measure. They measure the SPREAD of a curve around the
mean. Thus, once the shape of a curve is established, increasing
the sample size does NOT significantly reduce the variance.

In any case, we are not measuring a distribution here. In any
given reading, two manuscripts can do only two things. They can
agree, or they can disagree. They cannot have "63% agreement," or
some such. There is NO curve for the variance to measure. Just
two spikes, with "truth values" 0 and 1.

Please, folks, let's get our mathematics right.

>The RELATIVE statistics are =exactly= what gets botched up
>by bias in your sampling.  What you need is a sampling method
>that you can convince your readers is totally independent of
>what you're trying to measure.

"Totally" independent? No such thing.

I'm not saying my sample is perfect. I would prefer one that
is at once larger and less biased. But I have yet to see something
better offered. Certainly the statistical basis used by the Alands
is worse; the sample is smaller AND conforms to their peculiar
baises.

>And you would have a hard time convincing me that the selection
>of variants in the UBS3 apparatus is =independent= of the attestation
>by texttypes.

That last statement, at least, I agree with. That's the exact reason
why I increased the sample.

Bob Waltz
waltzmn@skypoint.com



Back