Page:EB1911 - Volume 22.djvu/412

From Wikisource
Jump to navigation Jump to search
This page has been proofread, but needs to be validated.
398
PROBABILITY
[LAWS OF ERROR


numerous set of observations, say x1, x2, . . . xn (taken as a sample from an indefinitely large group obeying any the same law of frequency) varies from set to set approximately according to the following law (to be established later)

, say;

where c2/2 the mean square of deviation, and j = the mean cube of deviation, and j/c3, say j, is small. Then, by abstraction analogous to that which has just been attributed to the method of least squares, we may regard the datum as a single observation, the arithmetic mean (of a sample batch of observations) subject to the law of error z = f(x). The most probable value of the quaesitum is therefore given by the equation f′(xx′) = 0, where x′ is the arithmetic mean of the given observations. From the resulting quadratic equation, putting x = x′ + ε, and recollecting that ε is small we have ε = jc. That is the correction due to the utilization of the mean cube of error. The most advantageous solution cannot now be determined,[1] f(x) being unsymmetrical, without assuming a particular form for the function of detriment. This method of least squares plus cubes may easily be extended to the case of several batches.

137. This application of probabilities not to the actual data but to a selected part thereof, this economy of the inverse method, is widely practised in miscellaneous statistics, where the object is to determine whether the discrepancy between two sets of observation is accidental or significant of a real difference.[2] For instance, let the data be ages at death of individuals of two classes (e.g. temperate or not so, urban or rural, &c.) who have been under observation, since the age of, say, 20. Granted that the ages at death conform to Gompertz's law; the determination of the modal age at death, that age at which the proportion of the total observed dying (per unit of time) is a maximum for each class, would most perfectly be effected by the genuine inverse method. That method will also enable us to determine the probability that the two modes should have differed to the observed extent by mere accident.[3] According to the abridged method it suffices to proceed as if our data consisted of two observations x′ and y′, the average ages at death of the two classes, each average obeying the normal law of error, with respective moduli , where x1, x2, &c., y1, y2, &c., are the respective sets of observed ages at death; as follows from the law of error, whatever the law of distribution of the given observations. According to a well-known property of the normal law, the difference between the averages of n and n′ observations respectively will range under a probability-curve with modulus , say c. Whence for the probability that a difference as great as the observed one, say e, should have occurred by chance we have ½[1 − θ(τ)], where τ = e/c, and θ(x) is the integral , given in many treatises.

138. This sort of abridgment may be extended to other kinds of average besides the arithmetic, in particular the median (that point Abridged Methods. which has as many of the given observations above as below it). By simple induction we know that the median of a large sample of observations is a probable value for the true median; how probable is determined as follows from a selection of our data. First suppose that all the observations are of the same weight. If x′ were the true median, the probability that as many as ½n + r of the observations should fall on either side of that point is given by the normal law for which the exponent is -2r2/n.[4] This probability that the observed median will differ from the true one by a certain number of observations is connected with the probability that they will differ by a certain extent of the abscissa, by the proposition that the number of observations contained between the true and apparent median is equal to the small difference between them multiplied by the density of observations at the median—in the case of normal and generally symmetrical curves the greatest ordinate. This is the second datum we require to select. In the case of the normal curve it may be calculated from the modulus itself, determined by induction from a selection of data. If the observations are not all of the same worth, weight may be assigned by counting one observation as if it occurred oftener than another. This is the essence of Laplace's Method of Situation.[5]

139. In its simplest form, where all the given observations are of equal weight, this method is of wide applicability. Compared with the genuine inverse method, it is always more convenient, seldom much less accurate, sometimes even more accurate. If the given observations obey the normal law, the precision of the median is less than the precision of the arithmetic mean by only some 25%—a discrepancy not very serious where only a rough estimate of the worth of an average is required. If the observations do not obey the normal law—especially if the extremities are abnormally divergent—the precision of the median may be greater than that of the arithmetic mean.[6]

140. Yet another instance of the contrast between genuine and abridged inversion is afforded by the problem to determine the Determination of Frequency-Constants. modulus as well as the mean for a set of observations known to obey the normal law; what the first problem[7] becomes when the coefficient of dispersion is not given. By inverse probability we ought in that case, in addition to the equation dP/dx = 0, to put dP/dc = 0. Whence c2 = 2[(x′ − x1)2 + (x′ − x2)2 + &c. + (x′ − xn)2]/n, and x′ = (x1 + x2 + &c. + xn)/n. This solution differs from that which is often given in the textbooks[8] in that there, in the expression for c2, (n − 1) occurs in the denominator instead of n. The difference is explained by the fact that the authorities referred to determine c, not by genuine inversion, but by ordinary induction, by a condition which certainly would be fulfilled in the long run, but does not express the whole of our data; a condition in this respect like the equation of c to , where e is the difference (taken positively, without regard to its sign) between any observation and the arithmetic mean of all the observations.[9]

141. Of course the determination of the most probable value is subject to the speculative difficulties proper to a priori probability: which are particularly striking in this case, as it appears equally natural to take as that constant, of which the values are a priori equally probable, k( = c2/2), or even[10] h( = 1/c2), the measure of weight, as in fact Laplace has done;[11] yet no two of these assumptions can be exactly true.[12]

142. A more convenient determination is obtained from simple induction by equating the modulus to some datum of the observed group to which it would be equal if the group were complete—in particular to the distance from the median of some percentile (or point which marks off a certain percentage, e.g. 25 of the given observations) multiplied by a factor corresponding to the percentile obtainable from a familiar table. Mr Sheppard has given an interesting proof[13] that we cannot by way of percentiles obtain such good[14] results for the frequency-constants as by the use of “the average and average square” [the method prescribed by inverse probability].

143. The same philosophical subtleties, with greater mathematical complications, meet us when we pass on to the case of two or more Entangled Measurements. quaesita. The problem under this head which mainly exercised the older writers was to determine a number of unknown quantities, given a larger number, n, of equations involving them.

144. Supposing the true values approximately known, by substituting the approximate values in the given equations and expanding according to Taylor's theorem, there will be obtained for the corrections, say x, y . . ., n linear equations of the form

a1x + b1y · · = f1

a2x + b2y · · = f2,

where each a and b is a known coefficient, and each f is a fallible observation. Suppose that the error to which each is liable obeys the normal law, and that the modulus pertaining to each observation is the same—which latter condition can be secured by multiplying each equation by a proper factor—then if x′ and y′ are the true values of the quaesita, the frequency with which (a1x′ + b1y′ − f1) assumes different values is given by the equation , where c1 is constant which,


  1. The use of the cubes is also contrasted with that of the squares (only) in this respect: that it is no longer a matter of indifference how many of the original observations we assign to the batch of which the mean constitutes the single (compound) observation.
  2. The object of the writer's paper on “Methods of Statistics” in the Jubilee number of the Journ. Stat. Soc. (1885).
  3. See on the use of the inverse method to determine the mode of a group, the present writer's paper on “Probable Errors” in the Journ. Stat. Soc. (Sept. 1908).
  4. Above, par. 103.
  5. Théorie analytique, 2nd supp. p. 164. Mécanique céleste, bk. iii. art. 40; on which see the note in Bowdich's translation. The method may be extended to other percentiles. See Czuber, Beobachtungsfehler, § 58. Cf. Phil. Mag. (1886), p. 375; and Sheppard, Trans. Roy. Soc. (1889), 192, p. 135, ante, where the error incident to this kind of determination is ascertained with much precision.
  6. Cf. Phil. Mag. (1887), xxiv. 269 seq., where the median is prescribed in case of “discordant” (heterogeneous) observations. If the more drastic remedy of rejecting part of the data is resorted to Sheppard's method of performing that operation may be recommended (Proc. Lond. Math. Soc. vol. 31). He prescribes for cases to which the median may not be appropriate, namely, the determination of other frequency-constants besides the mean of the observations.
  7. Above, par. 134.
  8. E.g. Airy, Theory of Errors, art. 60.
  9. It is a nice point that the expression for c2, which has (n − 1) instead of n for denominator, though not the more probable, may yet be the more advantageous (supposing that there were any sensible difference between the two). Cf. Camb. Phil. Trans. (1885), vol. xiv. pt. ii. p. 165; and “Probable Errors,” Journ. Stat. Soc. (June 1908).
  10. Above, par. 96, note.
  11. Théorie analytique, 2nd supp. ed. 1847, p. 578.
  12. See the matter discussed in Camb. Phil. Trans., loc. cit.
  13. Trans. Roy. Soc. (1899), A, cxcii. 135.
  14. Good as tested by a comparison of the mean squares of errors in the frequency-constant determined by the compared methods.