Table of Contents
In those days there was no king in Israel; all the people did what was right in their own eyes.
No scientist trusts a number without knowing what kind of number it is (whether a unit of distance, time, energy, etc.) and its uncertainty (i.e. the error bounds). Experimental science took a leap forwards once it had worked out how to deal with uncertainties and how they combine together when there is a chain of measurements, each with its own uncertainty.
Humanities computing has yet to come up with a consistent and robust system for dealing with uncertainties, to its detriment. As a result, novices believe that everything in an edited text is beyond reasonable doubt, and editors wrestle with how to inform readers that there is doubt associated with some aspects of their editions. We can expect to make a great advance once we work out how to deal with the myriad uncertainties that we encounter in preparing texts for electronic publication.
The biblical text provides an example. In some places it is subject to uncertainty, being ultimately based on manuscript copies which are subject to textual and orthographic variations. (The same is true of every ancient text that was popular enough to be copied on a large scale.) There is uncertainty in what the author wrote (as when variations occur between manuscripts), what words occur in a particular copy (as when the ravages of time make it hard to read a manuscript), how the words are spelled, how punctuated, how sections are divided, which scribe wrote what, which corrector changed it, and so on.
Without a consistent approach to dealing with uncertainty, it is difficult to know whether an edited text is a true representation of the evidence. Without a consistent approach to combining uncertainties, it will be difficult to deal with anything that is based on a combination of uncertain items. Any solution to the problem of uncertainty needs to be robust, but also practical so that encoders will have a reasonable chance of getting the markup right.
Experimental science has benefited a great deal from having a robust system for expressing uncertainty. An experimental scientist asks three questions about any quantity:
What is the magnitude?
What is the unit?
What is the uncertainty?
The first question is undoubtedly important. Nevertheless, it is entirely meaningless unless the other two are answered as well. Without the second question, you don't know whether the quantity is of apples, oranges, parsecs, or microseconds. Without the third question, you don't know whether the thing is precisely known or whether the uncertainty is so great that the reported magnitude is basically a stab in the dark: There is a vast difference between, say, “the LD50 is 10.3 micrograms (+/- 1%)” and “the LD50 is 10 micrograms (+/- 50%).”
Typically, uncertainty is expressed statistically. A set of measurements is examined to determine the distribution of quantities. A confidence level is set and a confidence interval (i.e. range of values) is computed.
Example 1. Confidence interval
Ten measurements of the mains voltage produces the following results: 245, 252, 261, 238, 239, 242, 248, 235, 255, 241. The mean value of these measurements is 245.6 and the standard deviation estimated from the sample is 8.30. Given a normal distribution and a sample size of ten, one can be 95% confident that the true mean value is within 2.26 standard deviations of the estimate. (That is, within 2.26 x 8.30 = 18.8 volts. Student's t distribution is used to obtain the multiplying value of 2.26. For larger sample sizes, it approaches the corresponding value for the normal distribution, which is 1.96.) Therefore, the 95% confidence interval is the range 226.8 to 264.4 volts; this result can also be expressed as 245.6 +/- 18.8 volts. That is, if many trials involving groups of ten measurements were performed, we would expect the computed mean to be between 226.8 and 264.4 volts in 95% of them. We would also expect the mean voltage to be outside this range in 5% of the trials.
In common use these words are synonymous. Nevertheless, they can be given special meanings by definition: accuracy, in the context of data analysis, relates to how well a measurement corresponds to the actual value; precision, in the same context, relates to how repeatable the measurement is. An accurate measurement produces a result that is close to the actual value of what is being measured. A precise measurement has a relatively small confidence interval. Besides being precise and accurate or imprecise and inaccurate, a measurement might be accurate but imprecise (e.g. 240 +/- 25% volts, where the actual voltage is 240 volts) or precise but inaccurate (e.g. 265 +/- 1% volts, where the actual voltage is 240 volts). Accuracy, in this context, requires an absolute standard of measurement in order to be assessed. (Absolute standards do exist in the physical sciences.) Assessment of precision requires a series of measurements and some statistical analysis.
Some quantities are derived from measurements of other quantities, each with its own uncertainty. For example, a speed measurement is based on distance and time measurements. How do the uncertainties of constituent measurements affect the uncertainty of the result?
This question is well understood in experimental science. The details are not important; suffice to say that an elegant theory of propagation of errors has been worked out. Consequently, a recipe can be followed to obtain the overall uncertainty of a quantity derived from a series of uncertain elements.
All kinds of uncertainties are encountered when marking up a humanities text; a few are listed here:
Firstly, if the text is fragmentary, garbled or otherwise hard to decipher, there is uncertainty as to what the text actually is. Letters may be missing, which may or may not make uncertain the words that contain them.
How to mark up a particular aspect of the text may also be subject to doubt. Is “Paris” a place or a name?
Sometimes we are not sure which agent to blame for a particular feature. Say that a manuscript was produced by more than one scribe and that more than one corrector had subsequently worked upon it. (Codex Sinaiticus is an example.) Which scribe or corrector did what is not always easy to discern.
Descriptions of uncertainty in the humanities are often judgments. They are more like forensic decisions than physical measurements. Thus, language such as "beyond reasonable doubt," "more often than not," "unable to say" is more natural than pseudo-scientific language such as "95% confident."
For markup to be meaningful it must be reproducible. Independent encoders who are equally skilled in the art should be able to consistently arrive at the same description of the uncertainty of a given feature. This requirement immediately rules out a numerical description such as "cert=0.46." What does this description mean when the same encoder would choose a different number after lunch? Nothing! It is meaningless and a waste of space.
Are there viable alternatives? By all means. Here are some possibilities:
a set of categories
a set of (ranked) options
The simplest set is (certain, not certain). There are many things of which an encoder can be certain, beyond reasonable doubt. The letters on a clearly printed page, for example, admit of no doubt. “Certain” is the default certainty value of all encoding.
This binary approach has great advantages with respect to simplicity and repeatability. The encoder need only decide whether or not there is an element of doubt attached to a particular feature. If there is reasonable doubt in one person's mind, then it is likely that another will feel similar doubt.
A binary scheme is robust with respect to repeatability: fewer categories means less potential for misclassification. Nevertheless, a blanket category for everything that is not certain seems procrustean. There are circumstances where it is valid to express a degree of doubt, and potentially useful information is lost when an encoder collapses all levels of doubt into one. There is a difference between, say, two possibilities and twenty.
A set of categories can be designed to retain the advantages of simplicity and repeatability while allowing useful information on the degree of doubt to be recorded as well. When constructing such a scheme, it is important to clearly define the meaning of each degree of certainty so that encoders and users will know what is meant. Here are two possibilities:
Three degrees of certainty
Beyond reasonable doubt. A dead certainty. (More than 95% certain.) This is the default certainty value.
Subject to doubt. Reasonably good odds. (Between 5% and 95% certain.)
Very doubtful. A long shot. (Less than 5% certain.)
Example 2. Leiden conventions
Papyrologists use the Leiden Conventions when preparing printed editions of an ancient text. According to these conventions, letters in the manuscript that are beyond reasonable doubt are printed without embellishment. Any letter that is doubtful, through damage or being otherwise difficult to read, is marked with a sublinear dot (“dotted text”). Text that is so poorly preserved that it cannot be read is rendered as bare sublinear dots, with the number of dots approximating the number of unreadable letters. The three-level scheme given above suits this convention well.
Four degrees of certainty
The three degree scheme can be separated into four degrees by splitting the “medium” category.
As for “High,” above.
The best of two alternatives. Short odds. (Between 50% and 95% probable.)
The best of more than two but less than twenty alternatives. Long odds. (Between 5% and 50% probable.)
As with “Low,” above.
It is liberating to have a default certainty of less than 100%. Transcription is less onerous when the encoder knows that uncertainty need only be reported when there is a reasonable doubt. If, say, a letter is not perfectly clear then there is a chance, however slim, that it could be another. Nevertheless, reporting such unlikely possibilities is a waste of effort for the encoder and the user; in most circumstances, nothing is gained by stating that the encoding may be wrong. When all is said and done, anyone who has marked up a document knows that there can be an element of doubt associated with almost every aspect of encoding. In order to make progress in an encoding project, there must be a threshold of certainty. Anything more certain than the threshold level is included without further comment. Everything less certain than the threshold should be flagged in some manner.
Another approach is to list alternatives. This suits many situations, including when textual features must be assigned to individual scribes and correctors of a manuscript. If the manuscript has been corrected then there is what the relevant scribe wrote and what the relevant corrector changed it to. If there is uncertainty concerning which scribe wrote the primary text or which corrector changed it, then listing the possibilities both indicates that there is doubt and identifies the likely suspects.
List members can be ranked, thus indicating which is thought most likely, followed by the next most likely, and so on. Not all possible members need be listed every time, only those regarded as likely contenders given the circumstances. In fact, as much is said by leaving non-contenders out of the list as by including the front-runners.
According to the definitions given before, accuracy relates to how well a measurement meets its mark while precision describes the measurement's repeatability. By analogy, accuracy and precision in the humanities may be defined as follows:
Tentative definitions
The degree to which an assertion concerning some aspect of an item coincides with its actual state.
The degree to which individual assertions concerning the same aspect of an item coincide.
Note | |
---|---|
I use “individual” rather than “independent” in the second definition as there is probably no such thing as “independent” when it comes to assertions made in the realm of humanities. |
The first definition has a fundamental flaw: it cannot be verified. How can we say, in an absolute sense, whether someone's description of something is right? All we can do is make judgments which, by their nature, are not absolute. The physical sciences do not have this problem because they have physical standards that provide absolute references for units such as the metre, kilogram, second and ampere from which flow references for derivative units. Perhaps the closest approximation of an absolute standard that we have in the humanities is consensus among peers. Imagine, if you will, a room-full of experts, skilled in the relevant art, asked to describe some aspect of an item. In this context, the most accurate description could be defined to be the most popular one. (Mind you, they could all be wrong.)
The suggested definition of precision is less problematic. One can imagine an experiment in which an expert with short-term memory loss is asked to pass judgment on the same thing on a number of different occasions. Provided that the intervals between trials are long enough for the expert to totally forget about the previous encounters, the precision of a particular assertion is directly related to the number of times the expert makes the same assertion. Alternatively, using the “room-full of experts” image, precision could be characterised as the inverse of disarray.
Seeking consensus among scholars has been compared to herding cats. Nevertheless, a reasonable degree of precision is attainable provided that we employ useful schemes for describing our subject matter. Without doubt, the fewer the categories, the better the chance of achieving substantial agreement.
How should uncertainties be combined for a feature that depends on a number of uncertain elements? To give a concrete example, how uncertain is a word comprised of uncertain letters? Say that you find the following text with one letter obliterated:
The parallelogram has two pairs of p_rallel sides.
Even though the letter is missing, the context leaves no doubt as to its identity.
Say that the text was like this instead:
The _______log___ has two p___s of _ara__el _ide_.
Now you could be forgiven for thinking that it was about an archaeologist with two pints of caramel cider. So it seems that the uncertainty of the compound feature increases rapidly with the uncertainty of its constituents.
The relationship between the constituent and compound uncertainties is not simple. Halving the uncertainty of the constituents does not halve the uncertainty of the thing they comprise. Perhaps the best place to start in developing a theory of error propagation for humanities text markup would be in the area of Bayesian statistics. Hopefully, a mathematician will one day work out how to combine categorical measures of uncertainty. (Who knows? Maybe the theory has already been done?) Until then, a reasonably safe approach is to assign to the compound feature the lowest level of certainty found among its constituents. This method is conservative, nearly always overestimating the compound feature's uncertainty.
All kinds of techniques can be used to indicate levels of uncertainty:
At the moment, all the people do what seems right in their own eyes. Here is an area in need of standardization!