You get a bonus - 1 coin for daily activity. Now you have 1 coin

7.6. Consent Criteria

Lecture



Present   7.6.  Consent Criteria We will consider one of the questions related to the verification of the likelihood of hypotheses, namely, the question of the consistency of the theoretical and statistical distribution.

Assume that this statistical distribution is aligned with the help of some theoretical curve   7.6.  Consent Criteria (fig. 7.6.1). No matter how well the theoretical curve is chosen, some discrepancies are inevitable between it and the statistical distribution. Naturally, the question arises: are these differences explained only by random circumstances associated with a limited number of observations, or are they significant and related to the fact that the curve we have chosen does not evenly align this statistical distribution? To answer this question are the so-called "criteria of consent."

The idea of ​​applying acceptance criteria is as follows.

Based on this statistical material we have to test the hypothesis   7.6.  Consent Criteria consisting in that random variable   7.6.  Consent Criteria obeys some definite law of distribution. This law can be specified in one form or another: for example, as a distribution function   7.6.  Consent Criteria or as a distribution density   7.6.  Consent Criteria or in the form of a set of probabilities   7.6.  Consent Criteria where   7.6.  Consent Criteria - the probability that the magnitude   7.6.  Consent Criteria will fall into the limits   7.6.  Consent Criteria th discharge.

  7.6.  Consent Criteria

Fig. 7.6.1

Since of these forms, the distribution function   7.6.  Consent Criteria is the most common and defines any other, we will formulate a hypothesis   7.6.  Consent Criteria as consisting in that magnitude   7.6.  Consent Criteria has a distribution function   7.6.  Consent Criteria .

In order to accept or disprove the hypothesis   7.6.  Consent Criteria consider some value   7.6.  Consent Criteria characterizing the degree of discrepancy between the theoretical and statistical distributions. Magnitude   7.6.  Consent Criteria can be selected in various ways; for example, as   7.6.  Consent Criteria you can take the sum of the squares of the deviations of theoretical probabilities   7.6.  Consent Criteria from relevant frequencies   7.6.  Consent Criteria or the sum of the same squares with some coefficients ("weights"), or the maximum deviation of the statistical distribution function   7.6.  Consent Criteria from theoretical   7.6.  Consent Criteria etc. Let us assume that the quantity   7.6.  Consent Criteria selected one way or another. Obviously, this is some random variable. The distribution law of this random variable depends on the distribution law of the random variable.   7.6.  Consent Criteria over which experiments were made, and on the number of experiments   7.6.  Consent Criteria . If hypothesis   7.6.  Consent Criteria true, then the distribution law   7.6.  Consent Criteria determined by the law of distribution of magnitude   7.6.  Consent Criteria (function   7.6.  Consent Criteria ) and number   7.6.  Consent Criteria .

Suppose that this distribution law is known to us. As a result of this series of experiments, it was found that the chosen measure of discrepancy   7.6.  Consent Criteria took some meaning   7.6.  Consent Criteria . The question is whether this can be explained by random reasons or is this discrepancy too great and indicates the presence of a significant difference between the theoretical and statistical distributions and, consequently, the unsuitability of the hypothesis   7.6.  Consent Criteria ? To answer this question, suppose the hypothesis   7.6.  Consent Criteria is correct, and we calculate in this assumption the probability that the hypothesis   7.6.  Consent Criteria is correct, and we calculate in this assumption the probability that due to random reasons associated with an insufficient amount of experimental material, the measure of discrepancy   7.6.  Consent Criteria will be no less than the value we observed in the experiment   7.6.  Consent Criteria , i.e., we calculate the probability of an event:

  7.6.  Consent Criteria .

If this probability is very small, then the hypothesis   7.6.  Consent Criteria should reject as little believable; if this probability is significant, it should be recognized that the experimental data do not contradict the hypothesis   7.6.  Consent Criteria .

The question arises of how to choose the measure of discrepancy.   7.6.  Consent Criteria ? It turns out that with some ways of choosing it, the distribution law   7.6.  Consent Criteria has very simple properties with a sufficiently large   7.6.  Consent Criteria practically independent of function   7.6.  Consent Criteria . It is precisely such measures that discrepancies use in mathematical statistics as criteria for agreement.

Consider one of the most commonly used criteria of consent - the so-called "criterion   7.6.  Consent Criteria Pearson.

Suppose that produced   7.6.  Consent Criteria independent experiments, in each of which a random variable   7.6.  Consent Criteria took a certain meaning. The results of the experiments are summarized in   7.6.  Consent Criteria discharges and decorated in the form of statistical series:

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

It is required to check whether the experimental data are consistent with the hypothesis that the random variable   7.6.  Consent Criteria has a given distribution law (given by the distribution function   7.6.  Consent Criteria or density   7.6.  Consent Criteria ). Let's call this distribution law “theoretical”.

Knowing the distribution law, one can find the theoretical probabilities of a random variable falling into each of the digits:

  7.6.  Consent Criteria .

Checking the consistency of the theoretical and statistical distributions, we will proceed from the discrepancies between the theoretical probabilities   7.6.  Consent Criteria and observed frequencies   7.6.  Consent Criteria . It is natural to choose as a measure of the discrepancy between the theoretical and statistical distributions the sum of squared deviations   7.6.  Consent Criteria taken with some weights   7.6.  Consent Criteria :

  7.6.  Consent Criteria . (7.6.1)

Coefficients   7.6.  Consent Criteria (“Weights” of digits) are introduced because, in the general case, deviations related to different digits cannot be considered equal in importance. Indeed, the same in absolute value deviation   7.6.  Consent Criteria may be of little significance if the probability itself   7.6.  Consent Criteria is small. Therefore, of course "weight"   7.6.  Consent Criteria take back proportional to the probabilities of discharges   7.6.  Consent Criteria .

The next question is how to choose the coefficient of proportionality.

K. Pearson showed that if we put

  7.6.  Consent Criteria (7.6.2)

then for large   7.6.  Consent Criteria distribution law   7.6.  Consent Criteria It has very simple properties: it practically does not depend on the distribution function   7.6.  Consent Criteria and on the number of experiences   7.6.  Consent Criteria namely, this law when increasing   7.6.  Consent Criteria approaching the so-called “distribution   7.6.  Consent Criteria ".

With this choice of coefficients   7.6.  Consent Criteria measure of discrepancy is usually denoted   7.6.  Consent Criteria :

  7.6.  Consent Criteria . (7.6.3)

For ease of calculation (to avoid dealing with fractional values ​​with a large number of zeros), you can enter   7.6.  Consent Criteria under the sum sign and given that   7.6.  Consent Criteria where   7.6.  Consent Criteria - the number of values ​​in   7.6.  Consent Criteria th discharge, bring the formula (7.6.3) to the form:

  7.6.  Consent Criteria (7.6.4)

Distribution   7.6.  Consent Criteria depends on parameter   7.6.  Consent Criteria , called the number of "degrees of freedom" distribution. The number of "degrees of freedom"   7.6.  Consent Criteria equal to the number of digits   7.6.  Consent Criteria minus the number of independent conditions ("connections") imposed on the frequencies   7.6.  Consent Criteria . Examples of such conditions can be

  7.6.  Consent Criteria ,

if we require only that the sum of frequencies be equal to one (this requirement is imposed in all cases);

  7.6.  Consent Criteria ,

if we select a theoretical distribution with the condition that the theoretical and statistical averages coincide;

  7.6.  Consent Criteria ,

if we also require a coincidence of theoretical and statistical variances, etc.

For distribution   7.6.  Consent Criteria compiled tables (see table. 4 annex). Using these tables, you can for each value   7.6.  Consent Criteria and numbers of degrees of freedom   7.6.  Consent Criteria find probability   7.6.  Consent Criteria the fact that the value distributed by law   7.6.  Consent Criteria will surpass this value. In tab. 4 inputs are: probability value   7.6.  Consent Criteria and the number of degrees of freedom   7.6.  Consent Criteria . The numbers in the table represent the corresponding values.   7.6.  Consent Criteria .

Distribution   7.6.  Consent Criteria It makes it possible to assess the degree of consistency of the theoretical and statistical distributions. We will proceed from the fact that   7.6.  Consent Criteria really distributed by law   7.6.  Consent Criteria . Then the probability   7.6.  Consent Criteria , determined by the table, there is a probability that, due to purely random reasons, the measure of the discrepancy between the theoretical and statistical distributions (7.6.4) will be no less than that actually observed in this series of experiments   7.6.  Consent Criteria . If this probability   7.6.  Consent Criteria very small (so small that an event with such a probability can be considered almost impossible), the result of the experiment should be considered contrary to the hypothesis   7.6.  Consent Criteria that the law of distribution of magnitude   7.6.  Consent Criteria there is   7.6.  Consent Criteria . This hypothesis should be discarded as implausible. On the contrary, if the probability   7.6.  Consent Criteria relatively large, it is possible to recognize the discrepancies between the theoretical and statistical distributions insignificant and attributed to them due to random reasons. Hypothesis   7.6.  Consent Criteria that magnitude   7.6.  Consent Criteria distributed by law   7.6.  Consent Criteria , can be considered plausible or, at least, not contrary to experimental data.

Thus, the application of the criterion   7.6.  Consent Criteria to assessing the consistency of the theoretical and statistical distributions comes down to the following:

1) The measure of discrepancy is determined.   7.6.  Consent Criteria according to the formula (7.6.4).

2) The number of degrees of freedom is determined.   7.6.  Consent Criteria as the number of digits   7.6.  Consent Criteria minus the number of superimposed connections   7.6.  Consent Criteria :

  7.6.  Consent Criteria .

3) By   7.6.  Consent Criteria and   7.6.  Consent Criteria using table. 4 determines the probability that the quantity having the distribution   7.6.  Consent Criteria with   7.6.  Consent Criteria degrees of freedom that exceed this value   7.6.  Consent Criteria . If this probability is very small, the hypothesis is rejected as implausible. If this probability is relatively large, the hypothesis can be considered not contradicting the experimental data.

How low should the probability be   7.6.  Consent Criteria in order to discard or revise a hypothesis, the question is uncertain; it cannot be solved for mathematical reasons, as well as the question of how small the probability of an event must be in order to consider it practically impossible. In practice, if   7.6.  Consent Criteria turns out to be less than 0.1, it is recommended to check the experiment, if possible - to repeat it and in case noticeable discrepancies reappear, trying to find a distribution law that is more suitable for describing statistical data.

It should be noted that using the criterion   7.6.  Consent Criteria (or any other consent) it is possible only in some cases to refute the selected hypothesis   7.6.  Consent Criteria and discard it as clearly disagree with the experimental data - if the probability   7.6.  Consent Criteria is great, this fact alone can by no means be considered proof of the validity of the hypothesis   7.6.  Consent Criteria , and only indicates that the hypothesis does not contradict the experimental data.

At first glance it may seem that the greater the probability p, the better the consistency of the theoretical and statistical distributions and the more justified the choice of function   7.6.  Consent Criteria as a law of the distribution of a random variable. In fact, it is not. Assume, for example, that, evaluating the agreement of the theoretical and statistical distribution by the criterion   7.6.  Consent Criteria , we got   7.6.  Consent Criteria . This means that with a probability of 0.99 due to purely random reasons, with a given number of experiments, the discrepancies should be larger than the observed ones. We have received relatively very small discrepancies that are too small to recognize them as plausible. It is more reasonable to recognize that such a close coincidence of the theoretical and statistical distributions is not accidental and can be explained by certain reasons related to the registration and processing of experimental data (in particular, the “cleanup” of experimental data that is very common in practice, when some results are randomly discarded or several vary).

Of course, all these considerations are applicable only in cases where the number of experiments   7.6.  Consent Criteria is large enough (of the order of a few hundred) and when it makes sense to apply the criterion itself, based on the limiting distribution of the measure of discrepancy when   7.6.  Consent Criteria . Note that when using the criterion   7.6.  Consent Criteria not only the total number of experiments should be large enough   7.6.  Consent Criteria but the numbers of observations   7.6.  Consent Criteria in separate ranks. In practice, it is recommended to have at least 5 to 10 observations in each digit. If the number of observations in individual bits is very small (of the order of 1 - 2), it makes sense to combine some bits.

Example 1. Check consistency of theoretical and statistical distributions for example 1   7.6.  Consent Criteria .

Decision. Using the theoretical normal distribution law with parameters

  7.6.  Consent Criteria ,

find the probability of falling into the ranks by the formula

  7.6.  Consent Criteria ,

Where   7.6.  Consent Criteria - boundaries   7.6.  Consent Criteria th discharge.

Then we make a comparative table of numbers of hits in the bits.   7.6.  Consent Criteria and corresponding values   7.6.  Consent Criteria .

  7.6.  Consent Criteria

–4; –3

–3; –2

–2; –1

–1; 0

0; 1

1; 2

2; 3

3; 4

  7.6.  Consent Criteria

6

25

72

133

120

88

46

ten

  7.6.  Consent Criteria

6.2

26.2

71.2

122,

131,8

90.5

38.5

10.5

According to the formula (7.6.4) determine the value of the measure of discrepancy

  7.6.  Consent Criteria

We determine the number of degrees of freedom as the number of digits minus the number of superimposed bonds.   7.6.  Consent Criteria (in this case   7.6.  Consent Criteria ):

  7.6.  Consent Criteria .

According to the table. 4 applications we find for   7.6.  Consent Criteria :

at   7.6.  Consent Criteria

at   7.6.  Consent Criteria .

Therefore, the desired probability   7.6.  Consent Criteria at   7.6.  Consent Criteria approximately equal to 0.56.This probability is not small; therefore, the hypothesis that the value is   7.6.  Consent Criteria distributed according to the normal law can be considered plausible.

Example 2. Check the consistency of the theoretical and statistical distributions for the conditions of example 2   7.6.  Consent Criteria 7.5.

Decision. Meanings   7.6.  Consent Criteria we calculate as probabilities of hitting the sections (20; 30). (30; 40), etc. for a random variable distributed according to the law of uniform density on a segment (23.6; 96.6). We make a comparative table of values  7.6.  Consent Criteria and   7.6.  Consent Criteria   7.6.  Consent Criteria :

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

By the formula (7.6.4) we find   7.6.  Consent Criteria :

  7.6.  Consent Criteria

The number of degrees of freedom:

  7.6.  Consent Criteria

According to the table. 4 applications we have:

at   7.6.  Consent Criteria and   7.6.  Consent Criteria .

Consequently, the discrepancy observed by us between the theoretical and statistical distributions could appear for purely random reasons only with probability.   7.6.  Consent Criteria .Since this probability is very small, it should be recognized that the experimental data contradict the hypothesis that the value is   7.6.  Consent Criteria distributed according to the law of uniform density.

In addition to the criterion   7.6.  Consent Criteria , a number of other criteria are used in practice to assess the degree of consistency of the theoretical and statistical distributions. Of these, we briefly discuss the criteria of A.N. Kolmogorov.

As a measure of the discrepancy between the theoretical and statistical distributions, A.N. Kolmogorov considers the maximum modulus of the difference between the statistical distribution function   7.6.  Consent Criteria and the corresponding theoretical distribution function:

  7.6.  Consent Criteria .

The basis for choosing as a measure of the divergence of the value   7.6.  Consent Criteria is the simplicity of its calculation. At the same time, it has a fairly simple distribution law. A. N. Kolmogorov proved that, whatever the distribution function of a   7.6.  Consent Criteria continuous random variable   7.6.  Consent Criteria , with an unlimited increase in the number of independent observations  7.6.  Consent Criteria probability of inequality

  7.6.  Consent Criteria

tends to the limit

  7.6.  Consent Criteria (7.6.5)

The probability values   7.6.  Consent Criteria calculated by the formula   7.6.  Consent Criteria are given in table 7.6.1.

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

  7.6.  Consent Criteria

The scheme of application of the criterion A.N. Kolmogorov is as follows: a statistical distribution function   7.6.  Consent Criteria and an estimated theoretical distribution function are constructed   7.6.  Consent Criteria , and the maximum   7.6.  Consent Criteria modulus of the difference between them is determined (Fig. 7.6.2).

Further, the determined value

  7.6.  Consent Criteria

and table 7.6.1 is the probability   7.6.  Consent Criteria .This is the probability that (if the value is   7.6.  Consent Criteria indeed distributed according to the law   7.6.  Consent Criteria ) due to purely random reasons, the maximum discrepancy between  7.6.  Consent Criteria and   7.6.  Consent Criteria will be no less than actually observed. If the probability   7.6.  Consent Criteria very small, the hypothesis should be rejected as implausible; at relatively large,   7.6.  Consent Criteria it can be considered compatible with the experimental data.

  7.6.  Consent Criteria

Fig. 7.6.2

Criterion A.N. Kolmogorov its simplicity favorably with the previously described criterion  7.6.  Consent Criteria ;therefore, it is very readily applied in practice. However, it should be stipulated that this criterion can be applied only in the case when the hypothetical distribution is   7.6.  Consent Criteria completely known in advance from any theoretical considerations, i.e. when not only the type of distribution function is known   7.6.  Consent Criteria , but also all parameters included in it. Such a case is relatively rare in practice. Usually, from theoretical considerations, only the general form of the function is known   7.6.  Consent Criteria , and the numerical parameters included in it are determined from the given statistical material. When applying the criterion,   7.6.  Consent Criteria this circumstance is taken into account by a corresponding decrease in the number of degrees of freedom of distribution.  7.6.  Consent Criteria .Criterion A.N. Kolmogorov does not provide for such an agreement. If, however, this criterion is applied in cases where the parameters of the theoretical distribution are selected from statistical data, the criterion gives obviously high values ​​of probability  7.6.  Consent Criteria ; therefore, in some cases, we risk accepting as a plausible hypothesis, in reality, which does not agree well with experimental data.


Comments


To leave a comment
If you have any suggestion, idea, thanks or comment, feel free to write. We really value feedback and are glad to hear your opinion.
To reply

Probability theory. Mathematical Statistics and Stochastic Analysis

Terms: Probability theory. Mathematical Statistics and Stochastic Analysis