Basic Sorting Algorithms Content from the guide to life, the universe and everything

Basic Sorting Algorithms

1 Conversation

In many situations, we want to sort some 'things' into a certain order. We might want to put books or CDs into alphabetical order, or we may be using a packing algorithm1 that requires the data to be pre-sorted2. With a small amount of data, this is easy to do by sight. For larger amounts of data we need the help of clearly defined algorithms.

Ordering

Whatever data we have, there will always be a definite order that it can be put in. As an example, let us consider the problem of putting the numbers 7, 5, 2, 4, 10, 1 into ascending numerical order.

The Interchange Sort Algorithm

In this method, we find the smallest number in the set and interchange it with the number in the first position. Then we find the second smallest number, and swap it with the number in the second position, and so on.

In our example, we get the following (numbers in italics have been verified to be in the correct position by the algorithm):

PassOrder
07   5   2   4   10   1
11   5   2   4   10   7
21   2   5   4   10   7
31   2   4   5   10   7
41   2   4   5   10   7
51   2   4   5   7   10

This example requires 15 comparisons to sort six numbers.

There are a few things to notice here. Firstly, the fourth pass did not change the order of the data - it merely checked that 5 was in the correct position. Secondly, the final pass verifies the position of two numbers - if the second largest number has been put in the correct position, the largest must also be in the correct position. Because of this, we know that a set of m numbers will need m -1 passes to be put in the correct order.

The comparisons we are making between pairs of numbers are, in a sense, hidden in this algorithm. In each pass we need to find the smallest number of the remaining (unsorted) set3. This will need m - n comparisons on the nth pass, hence the total number of comparisons needed will be:

(m -1) + (m -2) + … + (m -[m -2]) + (m -[m -1])=(m -1) + (m -2) + … + 2 + 1
 =½(m×(m -1))

So for our six-item example, this gives:

½(m×(m -1))=½(6×5)
 =½×30
 =15

If we had twice as much data (that is, 12 items) we would have:

½(m×(m -1))=½(12×11)
 =½×132
 =66

The Bubble Sort Algorithm

This algorithm gets its name from the way each element in turn is 'floated' to the correct position. We first compare the numbers in the first and second positions - swap them if in the first is bigger than the second. We then compare the second and third, then the third and fourth... each time swapping the order if the first in the pair is larger than the second. We repeat this process (ignoring numbers that have been 'floated' to the top on previous passes) until no more swaps are made.

With our example, we get the following (here '-' is used to denote a comparison being made):

PassOrder
17 - 5   2   4   10   1
5   7 - 2   4   10   1
5   2   7 - 4   10   1
5   2   4   7 - 10   1
5   2   4   7   10 - 1
25 - 2   4   7   1   10
2   5 - 4   7   1   10
2   4   5 - 7   1   10
2   4   5   7 - 1   10
32 - 4   5   1   7   10
2   4 - 5   1   7   10
2   4   5 - 1   7   10
42 - 4   1   5   7   10
2   4 - 1   5   7   10
52 - 1   4   5   7   10
 1   2   4   5   7   10

As with the Interchange Sort algorithm, this uses 15 comparisons to sort our example.

You may have noticed that we compared some pairs of numbers more than once - so this is certainly not the most efficient algorithm for this set of data. In each pass, we float one number into the correct position. Therefore, m numbers will require m -1 passes to be ordered.

In the nth pass, we will make m - n comparisons. Therefore the total number of comparisons we will need is:

(m -1) + (m -2) + … + (m -[m -2]) + (m -[m -1])=(m -1) + (m -2) + … + 2 + 1
 =½(m×(m -1))

Therefore our six-item example will need ½(6×5)=15 comparisons.

If we again double the amount of data to 12 items, we will need ½(12×11)=66 comparisons.

The Shuttle Sort Algorithm

The Shuttle Sort builds on the Bubble Sort - in each pass through the numbers, we compare the nth and (n+1)st numbers. If we have to swap them, we then compare the smallest of this pair with the preceding number, and swap if necessary. Each time a swap is made, we compare the previous number.

PassOrder
17 - 5   2   4   10   1
25   7 - 2   4   10   1
5 - 2   7   4   10   1
32   5   7 - 4   10   1
2   5 - 4   7   10   1
2 - 4   5   7   10   1
42   4   5   7 - 10   1
52   4   5   7   10 - 1
2   4   5   7 - 1   10
2   4   5 - 1   7   10
2   4 - 1   5   7   10
2 - 1   4   5   7   10
 1   2   4   5   7   10

This uses just 12 comparisons to sort our example data.

Notice that it is not until the final pass that we can be sure that any numbers are in the correct position. Also, the fourth pass contained just a single comparison, since the two numbers being compared were already in the correct order. So the number of comparisons needed in each pass is not predictable. However, the maximum number of comparisons in the nth pass is n. We will clearly need m -1 passes for m numbers, so the maximum number of comparisons needed to put m numbers in order is:

1 + 2 + … + (m -2) + (m -1) = ½(m×(m -1))

Hence our example would need at most ½(6×5)=15 comparisons. You should note that this is a worst case estimate - in general, we will need less comparisons than this.

For sorting 12 items, this algorithm would use at most ½(12×11)=66 comparisons.

So Which Is the Best?

Well... that rather depends on the data you're trying to sort. If the data has some partial ordering4, Shuttle Sort will give faster results. Look at the initial order of our example data - and then see where Shuttle Sort 'missed out' comparisons that would have been made by the other two algorithms.

Conversely, both Interchange Sort and Bubble Sort will always need m - n comparisons on the nth pass, after which we can be certain that n numbers are in the correct positions. This means that they will always use the maximum number of comparisons to sort the data - even if it is already in the correct order.

If we were to properly analyse the complexity of the algorithms, we would use the 'big O' notation, which would allow us to easily compare the complexity of many different algorithms.

We would also need to find an estimate for the time taken to actually swap the numbers over. With a large amount of data to be sorted, this will certainly need to be considered if we are to properly analyse the expected running time of an algorithm.

There are, of course, more advanced sorting algorithms. These include Quicksort (which is a recursive algorithm), and Heapsort.

Further Information

If you found this entry interesting, and are keen for further information, take a look at The Art of Computer Programming (volume 3): Sorting and Searching, by Donald E Knuth. The book examines many further algorithms, with a detailed analysis of each.

1An algorithm that calculates how to pack items in a given volume.2The efficiency of some packing algorithms is greatly enhanced if the data is put into some kind of order before running the algorithm.3To do this, we compare the first and second unsorted numbers, then compare the smallest of these with the third unsorted number, etc.4That is, subgroups of numbers within the data that are in the correct order within the subgroup (though not necessarily in the correct positions within the whole set of data).

Bookmark on your Personal Space


Conversations About This Entry

Edited Entry

A944471

Infinite Improbability Drive

Infinite Improbability Drive

Read a random Edited Entry

Categorised In:


References

h2g2 Entries

Write an Entry

"The Hitchhiker's Guide to the Galaxy is a wholly remarkable book. It has been compiled and recompiled many times and under many different editorships. It contains contributions from countless numbers of travellers and researchers."

Write an entry
Read more