Introduction to Sorting

Sorting is, without doubt, the most fundamental algorithmic problem that was faced in the early days on computing. In fact, most of the computer science research was centered on finding a best way to sort a set of data. There is probably a good reason to make sorting that important.

1. Supposedly, 25% of all CPU cycles are spent sorting
2. Sorting is fundamental to most other algorithmic problems, for example binary search.
3. Many different approaches lead to useful sorting algorithms, and these ideas can be used to solve many other problems.

The Process of Sorting

Given a data set {x1, x2, …xn} we need to find a permutation such that the set is sorted in increasing or decreasing order. However, if we look for all permutations of {x1, x2, …xn}, then there are n! of them around. Needless to say, n! is huge even for a small n such as n=20. However, if you can find all permutations of the data set, then we can determine if any of those lists is sorted in O(n) time.  Therefore we need to think of sorting a list as a different activity other than finding a permutation that is sorted. Let us make some definitions first.

A pair of elements is inverted if  xi  > xj  for i < j . Therefore a non-sorted list has at least one inverted pair. Now we can define the act of sorting as removing all inverted pairs in the list.  In other words, if you can prove that the number of inverted pairs in a list of elements is zero, then the list is sorted. Hence our goal is to study algorithms that removes inverted pairs from a list of elements.

Issues in Sorting

There are many issues that need to be considered when sorting a list. We need to consider whether we need to sort the list in increasing or decreasing order. Clearly we can use the same algorithm in both cases. All we need to do is to change the comparison criteria from > to < or vice versa. What about equal keys? Do we change their order or leave them wherever they are? How about non-numerical keys such as Strings? How do we sort them? What if we want to sort a list of names by two criteria’s? First by last name, then by first name?

There is one thing that we assume for any list that needs to be sorted. We assume that keys in the list can be “compared” by some criteria. In Java sense, we assume that interface Comparable is implemented for the objects we are trying to compare.

Applications of Sorting

There are many applications of sorting. Once a list is sorted many questions about the list can be answered easily. We can efficiently find an element in a sorted list using Binary Search.  Binary search requires only O(log n) operations in finding an element. We can also determine in O(n) if a sorted list has duplicates. We can construct a frequency distribution of the list if the list is sorted, or find the median and mode of the list in O(1) and O(n) respectively. We can find the kth largest element in a list in O(1) time.

Sorting Algorithms

There are several different ideas which lead to sorting algorithms. We list some of the ideas below.

• Insertion - putting an element in the appropriate place in a sorted list yields a larger sorted list.
• Exchange or Bubble - rearrange pairs of elements which are out of order, until no such pairs remain.
• Selection - extract the largest element form the list, exchange with the last element in the current list, and repeat.
• Bucket - separate into piles based on the first letter, then sort each pile.
• Merging - Two sorted lists can be easily combined to form a sorted list.

# Bubble Sort

This is probably the simplest way sort an array of objects. Unfortunately it is also the slowest way! The basic idea is to compare two neighboring objects, and to swap them if they are in the wrong order. This removes all inversions methodically and eventually leads to a sorted list. Given an array `A` of numbers, with length `n`, here's a snippet of java code for bubble sort:

` `
`for (int i=0; i<n-1; i++) {`
`  for (int j=0; j<n-1-i; j++)`
`    if (A[j+1].compareTo(A[j])<0) {  /* compare the two neighbors */`
`      tmp = A[j];         /* swap a[j] and a[j+1]      */`
`      A[j] = A[j+1];`
`      A[j+1] = tmp;`
`  }`
`}`

As we can see, the algorithm consists of two nested loops. The index `j` in the inner loop travels up the array, comparing adjacent entries in the array (at `j` and `j+1`), while the outer loop causes the inner loop to make repeated passes through the array. After the first pass, the largest element is guaranteed to be at the end of the array, after the second pass, the second largest element is in position, and so on. That is why the upper bound in the inner loop (`n-1-i`) decreases with each pass: we don't have to re-visit the end of the array.

Analysis of the Algorithm

We analyze several cases. Let us consider the case of random numbers.

The outer loop of the code runs (n-1) times and for each one of those the inner loop runs (n-i-1) times where i = 0…n-1. Hence the total number of operations of the loops is

(n-1) + (n-2) + …+ 1 = n/2(n-1). Hence the Bubble sort is of O(n2) when the list is a random list of items. Bubble sort can be improved if the inner loop can be replaced by a while loop that ends when there are no swaps done in a round of comparisons. If the array is already sorted, no elements are swapped and eventually algorithm stops. We should also note that each swap takes 3 operations to perform.

Selection Sort

Another more intuitive sorting algorithm is the selection sort, where we repeatedly find the largest element in a list, move it to the end of the list and repeat the process until list is sorted.  Finding the largest element in a list of size i takes (i-1) operations. In each step of the selection sort algorithm, size of the list is reduced by 1. A pseudo code for selection sort algorithm is given as follows.

for  i = n-1 to 1 do

find the largest element in A[0..i]

swap with A[i]

Insertion Sort

Insertion sort is a simple sorting algorithm that is appropriate for small inputs. It is  generally considered to be good solution if only a few elements need sorting because it is such a short algorithm and the time required to sort is not likely to be an issue. However, if we are dealing with a large amount of data, insertion sort is a poor choice because it is too time consuming. The idea of the insertion sort is that, we assume that a list of size 1 (or the list that contains the first element is sorted). Then we start inserting each element A[i] (i from 1 to n-1) to the correct place. The pseudo code for insertion sort is given as follows.

for (int i=1; i<n; i++)

insert A[i] to the array A[0..i-1] in order

Here is a picture that demonstrates that process.

Figure Courtesy of Weiss Data Structures Book