Lecture
Criteria informative signs |
When solving recognition problems, the main criterion (including for evaluating the informativeness of features) is the risk of loss. We will talk more about it in the second section of the course of lectures. Here we only note that it is based on an estimate of the probabilities of recognition errors and their cost. It is possible to speak about the estimation of probabilities only within the framework of a statistical approach, therefore in this section it is better to apply a type criterion: the proportion of the control (examination) sample that was recognized incorrectly. We have already mentioned that the training sample objects should not be included in the control sample. In cases where the total sample is small in size, dividing it into two parts is a very undesirable step (both the quality of training and confidence in the results of control will deteriorate). Some researchers apply the so-called sliding control method to compensate for this deficiency. It consists of the following. All objects, except one, are presented as a training sample. One object that did not participate in the training is presented for control. Then another object is selected from the total sample for control, and the rest of the sample is trained. This procedure is repeated as many times as the objects in the total sample. In this case, the entire sample is involved in both training and control, but control objects do not participate in training. This positive effect is achieved at the cost of training not just once, as it would have been if there were two different samples (training and control) of a sufficiently large volume, but as many times as there are objects in the total sample. This disadvantage is significant, since the training procedure is usually quite complicated and its repeated repetition is undesirable. If this procedure is used to select informative features, then the number of “trainings” should be multiplied by the number of feature sets to be compared. Therefore, to assess the informativeness of features and solve other problems, it is often not the relative number of recognition errors that is used, but other criteria associated with it. In any case, these criteria express the degree of distinguishability of objects of different images. For example, as already noted when considering taxonomy algorithms, the ratio of the average distance between objects of different images to the average distance between objects of the same image in some cases is very effective. It is proposed to independently write the corresponding computational formulas by entering the necessary notation. When using such criteria, a control sample is not needed, but a one-to-one connection with the number of recognition errors is lost. It is clear that the average distance between objects of different classes is obtained by averaging the distances between all possible pairs of objects belonging to different classes. If the number of classes is large and each of them is represented by a significant number of objects, the averaging procedure is cumbersome. In this case, you can use the averaging of the distances between the standards of different classes, and within the classes - the averaging of the distances from the objects to the standard of this class. It is quite clear that such a simplification is not always permissible. It all depends on the shape and relative position of the areas of the attribute space in which objects of different classes are concentrated. |
Selection of informative features |
We assume that the set of initial features is given. In fact, it is determined by the teacher. It is important that it includes those signs that really carry distinctive information. The fulfillment of this condition depends to a decisive extent on the experience and intuition of the teacher, his good knowledge of the subject area for which the recognition system is being created. If the initial feature space is specified, then the selection of a smaller number of the most informative features (the formation of a feature space of a smaller dimension) is formalized. Let be In fig. 16 shows a linear coordinate transformation.
After converting the sign In fig. 17 illustrates the transition from the Cartesian to the polar coordinate system, which led to the expediency of discarding the feature Such transformations simplify the decision rules, since they have to be built in a space of lower dimension. However, this necessitates the implementation of a transformation Fig. 16. Linear coordinate transformation Highlight the following type of linear transformation:
Where This means that part of the original feature system is discarded. Of course, the remaining signs should form the most informative subsystem. Thus, it is necessary to reasonably organize the selection procedure according to one of the previously considered criteria of informativeness. Consider some approaches. The optimal solution to the problem gives a complete brute force. If the source system contains Fig. 17. Transition to the polar coordinate system Consider some of the procedures used. 1. Evaluated the information content of each of the source signs, taken separately. Then signs are ranked by decreasing information content. After that are selected 2. It is assumed that the signs are statistically dependent. First, the most individually informative feature is selected (viewed 3. Sequential rejection of signs. This approach is similar to the previous one. From the aggregate containing 4. Random search. Randomly selected numbers 5. Random search with adaptation. This is a sequential directional procedure based on a random search, taking into account the results of previous selections. At the beginning of the procedure, the chances of all the initial signs of entering a subsystem consisting of |
Comments
To leave a comment
Pattern recognition
Terms: Pattern recognition