Lecture
The search task is associated with finding a given value, called the search key among the given set (or multiset). There is a huge number of search algorithms. Their complexity varies from simple search algorithms by the method of sequential comparison, to extremely efficient, but limited binary search algorithms, as well as algorithms based on the representation of the basic set in another form more suitable for searching. The last of the mentioned algorithms are of particular practical importance, since they are used in real-life applications that perform the selection and storage of arrays of information in huge databases.
To solve the search problem, there is also no single algorithm that would be better suited for all cases. Some of the algorithms run faster than others, but they require additional RAM. Others are very fast, but they can only be used for pre-sorted arrays and the like. Unlike sorting algorithms, search algorithms do not have a stability problem, but other difficulties may arise when using them. In particular, in those applications where the data being processed may vary, and the number of changes is comparable to the number of search operations, the search should be closely related to the other two operations — adding an item to the data set and deleting it from it. In such situations, it is necessary to modify the data structures and algorithms so that an equilibrium between the requirements for each operation is achieved. In addition, the organization of very large data sets in order to perform an effective search in them (as well as the addition and removal of elements) is an extremely complex task, the solution of which is especially important from the point of view of practical application.
Each algorithm has its advantages and disadvantages. Therefore, it is important to choose the algorithm that is best suited to solve a specific problem.
The search problem can be formulated as follows: find one or more elements in the set, and the required elements must have a certain property. This property can be absolute or relative. A relative property characterizes an element with respect to other elements: for example, a minimal element in a set of numbers. An example of the task of finding an element with an absolute property: find in the finite set of numbered elements element number 13, if one exists.
Thus, the search task has the following steps [2]:
1) calculation of the property of an element; often it’s just getting the “value” of the element, the key of the element, etc .;
2) comparing the property of an element with a reference property (for absolute properties) or comparing the properties of two elements (for relative properties);
3) enumeration of the elements of the set, i.e., passing through the elements of the set.
The first two steps are relatively simple. The whole essence of various search methods is concentrated in search methods, in search strategy, and here a number of questions arise [2]:
The answers to these questions depend on the data structure in which many items are stored. By imposing minor restrictions on the structure of the source data, you can get many different search strategies of varying degrees of effectiveness.
Sequential search
Searching for the desired record in an unsorted list is reduced to viewing the entire list before the record is found. "To start from the beginning and continue until the key is found, then stop" [1] is the simplest of the search algorithms. This algorithm is not very efficient, but it works on an arbitrary list.
The search algorithm is faced with the important task of determining the location of the key, so it returns the index of the record containing the desired key. If the search failed (no key value was found), then the search algorithm usually returns an index value that exceeds the upper bound of the array.
Hereinafter, it is assumed that array A consists of records, and its description and assignment is performed outside the procedure (the array is global with respect to the procedure). The only thing you need to know about the type of records is that the record includes the key field - the key by which the search is performed. Records go sequentially in the array and there are no gaps between them. Record numbers go from 1 to n - the total number of records. This will allow us to return 0 if the target element is not in the list. For simplicity, we assume that key values are not repeated.
SequentialSearch Procedure performs a sequential search for the element z in the array A [1 .. n ].
SequentialSearch ( A , z, n)
(1) for i ← 1 to n
(2) do if z = A [ i ]. key
(3) then return i
(4) return 0
Worst Case Analysis. The sequential search algorithm has two worst cases. In the first case, the target element is the last in the list. In the second, it is not on the list at all. In both cases, the algorithm will perform n comparisons, where n is the number of elements in the list.
Case analysis. The target value may occupy one of n possible positions. We will assume that all these positions are equally probable, i.e., the probability of meeting each of them is . Therefore, to find the ith element of A [ i ], i comparisons. As a result, for complexity in the average case, we obtain the equality
If the target value may not appear in the list, the number of possibilities increases to n + 1. in the absence of an element in the list, its search requires n comparisons. If we assume that all n + 1 possibilities are equally probable, then we obtain
It turns out that the possibility of the absence of an element in the list increases the complexity of the average case by , but compared with the length of the list, which can be very long, this value is negligible.
Consider the general case when the probability of meeting the desired item in the list is p i where p i ≥ 0 and In this case, the average complexity (mathematical expectation) of the element search will be equal to A good approximation of the frequency distribution to reality is Zipf's law: for i = 1, 2, ..., n . [ n is the most commonly used word in a natural language text with a frequency approximately inversely proportional to n .] The normalizing constant is chosen so that Let the elements of array A be ordered according to the indicated frequencies, then and the average time for a successful search will be
which is much less . This suggests that even a simple sequential search requires the selection of a reasonable set data structure that would increase the efficiency of the algorithm.
The sequential data retrieval algorithm is equally efficiently performed when placing the set a 1 , a 2 , ..., a n on adjacent or connected memory. The complexity of the algorithm is linear - O ( n ).
Logarithmic search
Logarithmic (binary or halving) data search is applicable to a sorted set of elements a 1 < a 2 < ... < a n , the placement of which is performed on adjacent memory. The essence of this method is as follows: the search begins with the middle element. When comparing the target value with the middle element of the sorted list, one of three results is possible: the values are equal, the target value is less than the list item, or the target value is larger than the list item. In the first and best case, the search is complete. In the remaining two cases, we can drop half the list. Indeed, when the target value is less than the middle element, we know that if it is in the list, it is before this middle element. If the target value is greater than the middle element, we know that if it is in the list, then it is after this middle element. This is enough for us to drop half the list with one comparison.
So, the comparison result allows you to determine in which half of the list to continue the search, applying the same procedure to it.
BinarySearch Procedure performs a binary search for the element z in a sorted array A [1 .. n ].
BinarySearch (A, z , n )
(1) p ← 1
(2) r ← n
(3) while p ≤ r do
(4) q ← [( p + r ) / 2]
(5) if A [ q ]. key = z
(6) then return q
(7) else if A [ q ]. key < z
(8) then p ← q +1
(9) else r ← q -1
(10) return 0
Worst Case Analysis . Since the algorithm divides the list in half each time, we will assume in the analysis that n = 2 k - 1 for some k . It is clear that on some passage the cycle deals with a list of 2 j - 1 items. The last iteration of the loop is performed when the size of the remaining part becomes equal to 1, and this happens when j = 1 (since 2 1 - 1 = 1). This means that for n = 2 k - 1 the number of passes does not exceed k . Therefore, in the worst case, the number of passes is k = log 2 ( n + 1).
Case analysis . We consider two cases. In the first case, the target value is probably contained in the list, and in the second it may not be there. In the first case, the target value n possible provisions. We assume that they are all equally probable. Imagine the data a 1 < a 2 < ... < a n in the form of a binary comparison tree that corresponds to a binary search. A binary tree is called a comparison tree if the condition is satisfied for any of its vertices:
{Tops of the left subtree} <{Top of the root} <{Tops of the right subtree}
If we consider the binary comparison tree, then for elements at nodes of level i i required comparisons. Since at level i there are 2 i ‑ 1 nodes, and for n = 2 k - 1 in the tree of k levels, the total number of comparisons for all possible cases can be calculated by summing the product of the number of nodes at each level by the number of comparisons at this level.
Substituting 2 k = n + 1, we obtain
In the second case, the number of possible positions of the element in the list is still n , but this time there are n + 1 possibilities for the target value not from the list. The number of possibilities is n + 1, since the target value can be less than the first element in the list, more than the first, but less than the second, more than the second, but less than the third, and so on, up to the possibility that the target value is more than the nth element. In each of these cases, the absence of an element in the list is detected after k comparisons. In total, 2 n + 1 possibilities are involved in the calculation.
.
Similarly, we obtain
Hence, the complexity of the search is logarithmic O (log 2 n ).
The considered binary search method is mainly intended for sorted elements a 1 < a 2 < ... < a n on adjacent memory of a fixed size. If the dimension of the vector changes dynamically, then the savings from the use of binary search will not cover the costs of maintaining an ordered arrangement a 1 < a 2 < ... < a n .
Internet Search Algorithms
Graph Search Algorithms
Combinatorial optimization
Substring search
Apriori Algorithm
God Algorithm
Selection algorithm
Alpha beta clipping
Binary search
Bidirectional search
Dichotomy
Nearest Neighbor Search Task
Interpolation Search
Key to determine
Linear search
Math Rubik's Cube
Golden Ratio Method
Brute force method
Depth Search
State Space Search
Wide search
Rainbow table
Trinity Search
Comments
To leave a comment
Algorithms
Terms: Algorithms