Sorting is the arrangement of data in memory in a regular manner according to the selected parameter. Regularity is considered as an increase (decrease) in the parameter value from the beginning to the end of the data array.

When processing data, it is important to know the information field of the data and its location in the machine.

There are internal and external sorting:

Internal sorting - sorting in RAM;

External sorting - sorting in external memory.

If the records being sorted take up a large amount of memory, then moving them is expensive. In order to reduce them, sorting is carried out in key address table, that is, they rearrange the pointers, but the array itself does not move. This - method for sorting the address table.

When sorting, identical keys may appear. In this case, it is advisable, after sorting, to arrange the same keys in the same order as in the source file.This - stable sorting.

We will only consider sorts that do not use additional RAM. Such sorts are called "in the same place".

The efficiency of sorting can be considered according to several criteria:

Time spent on sorting;

The amount of RAM required for sorting;

The time spent by the programmer writing the program.

Let's highlight the first criterion. The equivalent of time spent on sorting can be considered number of comparisons And number of movements when performing sorting.

The order of the number of comparisons and movements during sorting lies within

From O (n log n) to O (n 2);

O(n) is an ideal and unattainable case.

The following sorting methods are distinguished:

Strict (direct) methods;

Improved methods.

Strict methods:

Direct inclusion method;

Direct selection method;

Direct exchange method.

The effectiveness of strict methods is approximately the same.

Direct Sorting

The elements are mentally divided into a ready-made sequence a 1 ,...,a i-1 and the original sequence.

At each step, starting from i = 2 and increasing i each time by one, we extract from the original sequence i-th element and is transferred to the finished sequence, while it is inserted into the right place.

The essence of the algorithm is this:

for i = 2 to n

X = a(i)

We find a place among a(1)…a(i) for inclusion x

next i


There are two forward sorting algorithms. The first one is without a barrier

Direct inclusion sorting algorithm without barrier

for i = 2 to n

X = a(i)

For j = i - 1 downto 1

If x< a(j)

Then a(j + 1) = a(j)

Else go to L

Endif

Next j

L: a(j + 1) = x

next i

return

The disadvantage of the above algorithm is a violation of structured programming technology, in which it is undesirable to use unconditional jumps. If the inner loop is organized as while loop, then it is necessary to set up a “barrier”, without which, with negative key values, a loss of significance occurs and the computer “freezes”.

Direct inclusion sorting algorithm with barrier

for i = 2 to n

X = a(i)

A(0) = x (a(0) - barrier)

J = i - 1

While x< a(j) do

A(j +1) = a(j)

J = j - 1

Endwhile

A(j +1) = x

next i

return

Efficiency of the feedforward algorithm

The number of key comparisons Ci during the i-th sifting is at most equal to i-1, at least 1; If we assume that all permutations of N keys are equally likely, then the average number of comparisons = i/2. The number of transfers Mi=Ci+3 (including the barrier). The minimum estimates occur in the case of an already ordered initial sequence of elements, while the worst estimates occur when they are initially arranged in the reverse order. In some ways, inclusion sort exhibits truly natural behavior. It is clear that the above algorithm describes the process of stable sorting: the order of elements with equal keys remains unchanged.

The number of comparisons in the worst case, when the array is sorted in the opposite way, C max = n(n - 1)/2, i.e. - O (n 2). Number of permutations M max = C max + 3(n-1), i.e. - O (n 2). If the array is already sorted, then the number of comparisons and permutations is minimal: C min = n-1; M min = =3(n-1).

Sorting using direct exchange (bubble sort)

IN this section a method is described in which the exchange of places of two elements is a characteristic feature of the process. The direct exchange algorithm outlined below is based on comparing and changing places for a pair of neighboring elements and continuing this process until all elements are ordered.

We repeat passes through the array, each time moving the smallest element of the remaining sequence to the left end of the array. If we think of arrays as vertical rather than horizontal constructions, then the elements can be interpreted as bubbles in a vat of water, with the weight of each corresponding to its key. In this case, with each pass, one bubble rises to a level corresponding to its weight (see illustration in the figure below).

C min = n - 1, order O(n),

and there is no movement at all

A comparative analysis of direct sorting methods shows that exchange “sorting” in its classical form is a cross between sorting using inclusions and using selection. If the above improvements are introduced into it, then for sufficiently ordered arrays, bubble sort even has an advantage.

This method is commonly known as bubble sorting.


Direct exchange method algorithm

for j = n to i step -1

if a(j)< a(j - 1) then

In our case, we ended up with one empty pass. In order not to look through the elements once again, and therefore to make comparisons, spending time on this, you can enter a checkbox fl, which remains in value false, if during the next pass no exchange is made. In the algorithm below, additions are marked in bold.

fl = true

if fl = false then return

fl = false

for j = n to i step -1

if a(j)< a(j - 1) then

fl = true

An improvement on the bubble method is shaker sorting, where after each pass the direction in the inner loop is changed.

Efficiency of the direct exchange sorting algorithm

Number of comparisons C max = n(n-1)/2, order O(n 2).

Number of movements M max =3C max =3n(n-1)/2, order O(n 2).

If the array is already sorted and the algorithm with a flag is used, then just one pass is enough, and then we get the minimum number of comparisons

This method is widely used when playing cards. The elements (cards) are mentally divided into the already “ready” sequence A1 ... An and the original sequence Ai ... An. At each step, starting from i=2 and increasing I each time by one, the i-th element is extracted from the original sequence and transferred to the finished sequence, and it is inserted into the right place.

Shown above as an example is the process of sorting by including eight randomly selected numbers: The algorithm for this sort is as follows:

FOR i:=2 TO n DO

inclusion of x in the corresponding place among a ... a[j];

In the real process of searching for a suitable place, it is convenient, alternating comparisons and movements through the sequence, to sift through X, i.e. X is compared with the next element aj, and then either X is inserted onto free space, or aj is shifted (transmitted) to the right, and the process “goes” to the left. Please note that the sifting process may end if one of the following two different conditions is met:

1. An element aj with a key less than the key of X is found.

2. The left end of the finished sequence has been reached.

This typical case of a repeating process with two termination conditions allows us to use the well-known barrier technique (sentinel). It can be easily applied here by setting a barrier a0 with the value X. (Note that for this it is necessary to expand the range of the index in the description of the variable a to 0 ... n.)

Analysis of the direct inclusion method. The number of key comparisons (Ci) during the i-th sifting is at most equal to i - 1, at least 1; If we assume that all permutations of n keys are equally probable, then the average number of comparisons is i/2. The number of transfers (assignments of elements) Mi is equal to Ci + 2 (including the barrier). That's why total number comparisons and number of transfers are as follows:

Save = (n2 + n - 2)/4,

Сmax = (n2 + n - 4)/4,

M min = З*(n - 1),

M ave = (n2 + 9n - 10)/4,

M max = (n2 + 3n - 4)/2.

The minimum estimates occur in the case of an already ordered initial sequence of elements, while the worst estimates occur when they are initially arranged in the reverse order. In some ways, inclusion sort exhibits truly natural behavior. It is clear that the above algorithm describes the process of stable sorting: the order of elements with equal keys remains unchanged.

The algorithm with direct inclusions can be easily improved if you pay attention to the fact that the ready sequence (a1 ... ai-1 into which you need to insert a new element is itself already ordered. It is natural to focus on binary search, in which an attempt is made to compare with the middle of the ready sequence , and then the halving process continues until the insertion point is found. This modified sorting algorithm is called the binary insertion method.

Forward sort works on a list of unordered positive integers (usually called keys), sorting them in ascending order. This is done in much the same way that most players organize the cards they are dealt, picking up one card at a time. Let's demonstrate how the general procedure works using the following unsorted list of eight integers as an example:

27 412 71 81 59 14 273 87.

The sorted list is recreated; at first it is empty. At each iteration, the first number in the unsorted list is removed from it and placed in its corresponding place in the sorted list. To do this, the sorted list is traversed starting with the smallest number until the corresponding place for the new number is found, i.e. until all sorted numbers with smaller values ​​are in front of it, and all numbers with larger values ​​are after it. The following sequence of lists shows how this is done:

Iteration 0

Sorted 27

Iteration 1 Unsorted 412 71 81 59 14 273 87

Sorted 27,412

Iteration 2 Unsorted 71 81 59 14 273 87

Sorted 27 71 412

Iteration 3 Unsorted 81 59 14 273 87

Sorted 27 71 81 412

Iteration 4 Unsorted 59 14 273 87

Sorted 27 59 71 81 412

Iteration 5 Unsorted 14,273 87

Sorted 14 27 59 71 81 412

Iteration 6 Unsorted 273 87

Sorted 14 27 59 71 81 273 412

Iteration 7 Unsorted 87

Sorted 14 27 59 71 81 87 273 412

In the following algorithm, only one list is created, and the reorganization of numbers is done in the old list.

Algorithm SIS(Direct Inclusion Sorting). Sort the sequence of integers I(1), I(2), . . . ,I (N) in ascending order.

Step 1.[Main iteration]

For J← 2 to N do through step 4 od ;and STOP.

Step 2.[Select next integer] Set K←I(J); and L←J−1.

Step 3.[Comparison with sorted integers] While K

AND L≥1 do set I (L+1) I(L); and L←L−1 od.

Step 4.[Enable] Set I(L+1)←K.

QUICKSORT:Sorting algorithm with average running time O(N ln N)

The main reason for the slow operation of the SIS algorithm is that, all comparisons and exchanges between keys in a sequence a 1, a 2, . . . , and N occur for pairs of neighboring elements. This method requires a relatively large

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Line 38 08 16 06 79 76 57 24 56 02 58 48 04 70 45 47Action

1 38 47 decrease j



5 04 38 exchange

6 08 38 increase i

10 38 79 exchange

14 02 38 exchange

15 76 38 increase i,>

16 38 76 exchange

17 38 56 decrease j

19 24 38 exchange

20 57 38 increase i,>

21 38 57 exchange, decrease

22 04 08 16 06 02 24 38 57 56 76 58 48 79 70 45 47

(1 2 3 4 5 6) 7 (8 9 10 11 12 13 14 15 16)


time to put a misplaced key into the correct position in the sequence being sorted. It is natural to try to speed up this process by comparing pairs of elements that are far apart in the sequence.

K. Hoor invented and very effectively applied this idea (the QUICKSORT algorithm), reducing the average running time of the SIS algorithm from the order of O(N 2) to the order of O(N ln N). Let us explain this algorithm with the following example.

Let's say we want to sort the sequence of numbers from the first row in Fig. 15. Let's start with the assumption that first the key in this sequence(38) serves as a good approximation of the key that will eventually appear in the middle of the sorted sequence. We use this value as a leading element, relative to which the keys can be swapped, and continue as follows. We install two pointers I and J, of which I starts counting from the left (I=1), and J starts from the left in the sequence (J=N). Comparing a I And and J. If and I ≤a J, set J←J−1 and carry out the following comparison. Let's continue reduce J until we reach a I >a J . Then let's switch places a I ↔a J(Fig. 15, line 5, key exchange 38 and 04), set I←I+1 and continue increase I until we get a I >a J . After the next exchange (line 10, 79↔38) we decrease J again. Alternating between decreasing J and increasing I, we continue this process from both ends of the sequence to the “middle” until we get I=J.



Now there are two facts. First, key(38), which was initially in position one, is now in its proper place in the sequence being sorted. First, all the keys to the left of this element will be smaller, and all the keys to the right will be large.

The same procedure can be applied to the left and right subsequences to finally sort the entire sequence. The last line (numbered 22) of Fig. 15 shows that when I=J is obtained, then I=7. The procedure is then applied again to the subsequences (1,6) and (8,16).

The recursive nature of the algorithm suggests that the indices of the outermost elements of the larger of the two unsorted subsequences (8,16) should be placed on the stack and then proceed to sort the smaller subsequence (1,6).

In line 4 in Fig. 15, the number 04 has moved to position 2 and the subsequences (1,1) and (3,6) are subject to sorting. Since (1,1) is already sorted (number 02), we sort (3,6), which in turn leads to line 6, in which (3,4) and (6,6) are to be sorted. In line 7, the subsequence (1,6) is sorted. Now we pop (8,16) from the stack and start sorting this subsequence. Line 13 contains the subsequences (8,11) and (13,16) that need to be sorted. We put (13,16) on the stack, sort (8,11), etc. On line 20, the entire sequence is sorted.

Before describing the QUICKSORT algorithm formally, we need to show exactly how it works. We use the stack [ LEFT (K), RIGHT (K) ] to remember the indices of the leftmost and rightmost elements of subsequences that have not yet been sorted. Because short subsequences are sorted faster by the conventional algorithm, the QUICKSORT algorithm has an input parameter M that determines how short a subsequence must be in order to be sorted in the conventional way. For this purpose, we use simple inclusion sort (SIS).

Search

Let us now turn to an examination of some of the basic problems related to information retrieval from data structures. As in the previous section on sorting, we will assume that all information is stored in records that can be identified by key values, i.e. record R i corresponds to the key value denoted by K i .

Let's assume that there are N records randomly arranged in a file in the form of a linear array. The obvious method for finding a given entry would be to look through the keys sequentially. If the required key is found, the search ends successfully; otherwise, all keys will be searched and the search will fail. If all possible key orders are equally likely, then such an algorithm requires O(N) basic operations in both the worst and average cases. Search time can be significantly reduced if you pre-order the file by keys. This preliminary work makes sense if the file is large enough and accessed frequently.

Let's assume that we went to the middle of the file and found the key K i there. Let's compare K and K i. If K=K i, then the required record is found. If K<К i ,то ключ К должен находиться в части файла, предшествующей К i (если запись с ключом К вообще существует) . Аналогично, если К i <К, то дальнейший поиск следует вести в части файла, следующей за К i . Если повторять эту процедуру проверки ключа К i из середины непросмотренной части файла, тогда каждое безуспешное сравнение К с К i будет исключать из рассмотрения приблизительно половину непросмотренной части.

Flowchart of this procedure, known as binary search, shown in Fig. 16

Algorithm BSEARCH (Binary search) search for a record with key K in a file with N≥2 records, the keys of which are ordered in ascending order K 1<К 2 …<К N .

Step 0.[Initialization] Set FIRST←1 ; LAST← N. (FIRST and LAST are pointers to the first and last keys in the part of the file that has not yet been viewed.)

Step 1.[Main loop] While LAST≥FIRST do through step 4 od.

Step 2.[Obtaining the central key] Set I←|_(FIRST + LAST)/2_| .(K i is the key located in the middle or to the left of the middle of the part of the file that has not yet been viewed.)

Step 3.[Check for success] If K=K I then PRINT "Successful completion, key equal to K I"; and STOP fi.

Step 4.[Comparison] If K < KI then set LAST←I-1 else set FIRST←I+1 fi.

Step 5.[Failed search] PRINT "unsuccessful"; and STOP.

The BSEARCH algorithm is used to find K=42 in Fig. 17.

The binary search method can also be used to represent an ordered file as a binary tree. The key value found in the first execution of step 2 (K(8)=53) is the root of the tree. The key intervals to the left (1,7) and right (9,16) of this value are pushed onto the stack. The top interval is removed from the stack and using step 2, the middle element (or the element to the left of the middle) is found in it. This key (K(4)=33) becomes the element next to the left after the root if its value is less than the value of the root, and the next element to the right otherwise. The subintervals of this interval to the right and left of the newly added key [(1,3) , (5,7)] are now placed on the stack. This procedure is repeated until the stack is empty. Figure 18 shows the binary tree that would be built for the 16 ordered keys from Figure 17.

Binary search can now be interpreted as traversing this tree from the root to the searched record. If the final vertex is reached and the specified key is not found, the searched entry does not exist in the given file. Note that the number of vertices on a single path from the root to a given key K is equal to the number of comparisons performed by the BSEARCH algorithm when trying to find K.

Yes

The direct insertion method can be improved by finding the place for the inserted record in the ordered subtable using the method binary (dichotomous, binary, logarithmic) search. This modification of the insertion method is called insert with binary inclusion.

Let's consider j‑th sorting step ( j=2, 3, ..., n). If K[ j]>= K[ j-1] , then the orderliness has not been violated and we should move on to R[ j+1]– oh records. If K[ j]< K[ j-1] , That R[ j] stored in the working variable (Rab= R[ j]) and a place is looked for for it in the ordered part of the table - in the subtable. Let us denote the lower bound of the index of this subtable by ng, top - through vg (originally ng=1. vg=j-1).

According to binary search the key K[ j] the record in question R[ j] must first compare with the key K[ i] records R[ i] , located in the middle of the ordered subtable (i=(ng+vg) div 2). If K[ j]> K[ i], then the left side of the subtable - records with smaller keys - is discarded (that is, no longer considered) (ng= i+1) . If K[ j]< K[ i] , then the right side of the subtable is discarded - records with large keys (vg= i-1). The search continues in the rest of the subtable. The process of dividing parts of the subtable in half continues until one of the following situations occurs:

1) K[ j]= K[ i] , hence, (i+1) The -th position is the location for the entry in question. Let's move the records R[ i+1], R[ i+2], …, R[ j-1] one position to the right and thereby free up space for insertion (R[ i+1]= Rab).

2) K[ j]<> K[ i] And ng> vg – the keys do not match, and the length of the last subtable is 1. In this case, the insertion location is position ng, so records R[ ng], R[ ng+1], … , R[ j-1] must be shifted one position to the right (R[ ng]= Rab) .

The binary search algorithm is described in detail in the section "Dichotomous Match Search".

Let's look at an example j-th sorting step (the location of the record with the key equal to 9 is determined; j=7, K[ j]=9 ):

The average number of comparisons for this method is n log 2 (n).

Two-path insertion method

Two-path insertion method is a modification of the direct insertion method; it improves sorting performance.

To implement this method, an additional amount of memory is required equal to the amount occupied by the table to be sorted (let's call it the output zone T). At the first step of sorting to the middle of the output zone (position m=(n div 2)+1) the first table record is placed R. Other positions T empty for now. At subsequent sorting steps, the key of the next record R[ j] (j=2, 3, …, n) is compared with the entry key T[ m] and, depending on the comparison results, space for R[ j] found in T to the left or right of T[ m] by insertion method. In this case, the numbers of the leftmost one ( l) and the rightmost ( r) elements included in the output zone. Final values l And r equal 1 And n respectively.

The algorithm must also take into account the following situations:

    write key R[j] less than write key T[m], But l=1;

    write key R[j] more write key T[m], But r=n.

In these cases, to insert a record R[ j] it is necessary to shift the records of the subtable together with the record T[ m] right or left (uses direct insertion method).

Let's look at an example of sorting using this method.

Let the initial sequence of table keys look like:

24, 1, 28, 7, 25, 3, 6, 18, 8 (n=9, m=(n div 2)+ 1=5)

Step number

Output area


Close