Lecture
The course of lectures on pattern recognition naturally enters the system of training specialists in computer science, computer systems and networks. Without developing an arsenal of capabilities of artificial intelligence (including recognition methods), it is difficult to count on the harmonious improvement of information technologies, expanding the range of tasks solved on their basis.
The implementation of automatic translation from one language to another, automatic transcribing is impossible without the recognition of printed and handwritten texts and signs, oral speech.
The implementation of recognition methods is necessary in automated systems for use in forensic science, medicine, and military affairs. Such applications of recognition theory as cluster analysis (taxonomy), the identification of patterns in a variety of experimental data, the prediction of various processes or phenomena are widely used in scientific research. Recognition methods (classification) play an important role in actively developing geographic information systems.
An excerpt from the monograph by A.M. Beranton “Geoiconics”: “... the use of maps, the interpretation of images, the analysis of on-screen video images are always the recognition and analysis of graphic images, their measurement, transformation, comparison, etc. It follows that the recognition of graphic images, that is, the creation of a system of decision rules for their identification, classification and interpretation, is one of the main tasks of geo-sonics. "
Historically, the pattern recognition theory has evolved in two directions: deterministic and statistical, although most often it is not possible to distinguish them strictly. The deterministic approach includes various methods: empirical, heuristic, based on common sense, more or less successful modeling of actions performed by the human brain; mathematically formalized, for example, model-based generation of objects (implementations) of a particular image. It uses a different mathematical apparatus (mathematical logic, graph theory, topology, mathematical linguistics, mathematical programming, etc.).
The statistical approach is based on the fundamental results of mathematical statistics (estimation theory, sequential analysis, stochastic approximation, information theory).
Many methods of recognition, which appeared as deterministic, received further statistical justification. Examples of this kind are considered in the proposed course of lectures.
In the process of developing the theory of recognition, various approaches and the mathematical apparatus used are intertwined in such a bizarre way that the classification of various algorithms by the methods used is conditional and ambiguous. Nevertheless, in this course there are two sections: deterministic methods and statistical methods. This is done mainly for pedagogical reasons. Deterministic methods (especially empirical) are quite obvious, easier to perceive than statistical, therefore, it is methodically expedient to begin the presentation of the material with them.
Recognition is the assignment of a specific object (implementation), represented by the values of its properties (attributes), to one of a fixed list of images (classes) according to a certain decision rule in accordance with the goal.
It follows that recognition can be carried out by any system (living or non-living) that performs the following functions: measurement of characteristic values, production of calculations that implement the decision rule. At the same time, the list of images, informative features and decision rules are either defined by the recognition system from the outside, or are formed by the system itself. An auxiliary, but important feature of the recognition systems is the risk assessment of losses. Without this function, it is impossible, for example, to build optimal decision rules, choose the most informative feature system that is used in recognition, etc.
We introduce the following notation:
- a set of recognizable images (classes), sometimes called the alphabet;
- feature (selective) space;
- dimensionality of the feature space (the number of signs characterizing recognizable objects)
- a set of decision rules according to which the recognition of a recognizable object (implementation) is carried out to a particular image;
- the risk of loss in recognition.
The number of recognizable images always, of course, and can not be less than two. Hypothetically, of course, we can consider the case but it is degenerate, since all implementations belong to the same image. For this, it is not necessary to measure the values of any signs, the decision rule is trivial, and the practical meaning of solving this kind of recognition problem can hardly be seen.
The list of images, as already mentioned, can be defined by the recognition system from the outside (by the teacher). For example, if the system is designed for automatic transcribing, then phonemes are recognizable images - elements of oral speech.
In many cases, the recognition system itself generates a list of recognizable images. In the literature, this process is called unsupervised learning, self-learning, cluster analysis (taxonomy). This function is implemented most often in the research process: a natural-science classification, data analysis, the identification of patterns, etc.
Dimension of feature space they usually strive to do as little as possible, since this reduces the number of measurements required, simplifies the calculations that form and implement the decision rules, and increases the statistical stability of the recognition results. However, the decrease generally speaking leads to an increased risk of loss. Therefore, the formation of a feature space is a compromise task that can be divided into two parts: the formation of the initial feature space and the minimization of the dimension of this space. In terms of minimizing the dimension, there are formal methods, algorithms and programs. As for the original space, its formation is still based on experience, intuition, and even luck. Theoretically sound approaches to solving this problem are not found in the literature.
Building decision rules is perhaps the richest with respect to the developed approaches and methods for solving the component of recognition problems. The main goal, which is pursued, is to minimize the risk of loss.
Risk of loss in fact, it is a criterion by which the most informative feature space and the most effective decision rules are formed. Both the alphabet, signs, and decision rules must be such as to minimize the risk of loss. This criterion (characteristic of the recognition system) is composite. It generally includes losses (fines) for recognition errors and the cost of measuring the signs of recognizable objects. In the particular most widely used case, the average probability of a recognition error or the maximum component of the error probability matrix appears as the risk of loss. In practice, of course, we are not talking about probabilities, but about their sample estimates.
Fig. 1. Many rectangles and their representation
in feature space
So, can be represented as some dimension space with a metric defined in this space. Any object (implementation) is represented as a point (vector) in this space. The projection of this point on the y-axis corresponds to the value go sign. For example, a set of rectangles with sides parallel to the axes of coordinates can be represented by a set of points in a two-dimensional feature space (see Fig. 1) with a Euclidean metric, where - the length of the horizontal side, - the length of the vertical side. If we need to recognize two images - vertically and horizontally elongated rectangles, then the decision rule in the form of the bisector of the angle at the origin performs this task. All points (objects) lying above - to the left , refer to the image of "vertically elongated rectangles", below - to the right - "horizontally elongated rectangles".
As already noted, methods for solving recognition problems can be divided into deterministic and statistical. Let's start with deterministic methods.
продолжение следует...
Часть 1 Introduction to Title: Pattern Recognition Techniques
Comments
To leave a comment
Pattern recognition
Terms: Pattern recognition