Road to C Programmer #12 - Sorting Algorithms

The blog introduces different sorting algorithms and their implementations in C.

As a programmer, you will have to sort arrays countless times. In this article, I will go over some sorting algorithms and their implementations in C. I recommend you think through the implementations yourself before looking at mine for practice.

Bubble Sort

Bubble sort is one of the most intuitive sorting algorithms to understand, as it simply compares each adjacent element in an array to move the largest element to the right-most index, resulting in a sorted array from right to left. Initially, you traverse the entire array and place the largest element at the right-most index. In the next iteration, you only need to go over the first n - 1 elements because the right-most index is already sorted. This process continues until you reach the left-most element.

Here is the implementation of bubble sort in C:

void bubbleSort (int arr[], int size) {
    for (int i = 0; i < size - 1; i++) {
        for (int j = 0; i < size - i - 1; i++) {
            if (arr[j] > arr[j + 1]) {
                int temp = arr[j];
                arr[j] = arr[j + 1];
                arr[j + 1] = temp; 
            }
        }
    }
};

You can optimize this algorithm by stopping when there are no swaps in an iteration, indicating that the array is already sorted. While the memory usage is $O(1)$ , as we only deal with a pair of elements at a time, the time complexity is $O(n^2)$ because we need to check approximately $\frac{n}{4}$ pairs on average for $n$ times. (The name "bubble sort" likely comes from the way the largest elements "bubble" up to the top (right) of the array.)

Insertion Sort

Another intuitive sorting algorithm is insertion sort, which goes through each element to find the appropriate place to insert it into the left subarray. Unlike bubble sort, which sorts from right to left, insertion sort sorts from left to right.

Here is the implementation of insertion sort in C:

void insertionSort (int arr[], int size) {
    for (int i = 0; i < size; i++) {
        int j = i - 1;
 
        while (j >= 0 && arr[j] > arr[i]) {
            arr[j + 1] = arr[j]; // shifting
            j--;
        }
        arr[j + 1] = arr[i];
    }
}

We check each element on the left, which is assumed to be sorted, and shift elements to the right until we find a place to insert the current value. Like bubble sort, insertion sort has a space complexity of $O(1)$ and a time complexity of $O(n^2)$ because we check approximately $\frac{n}{4}$ pairs on average for $n$ times. However, it performs well when the data is almost sorted, as the while loop will run fewer times in such cases.

Selection Sort

Another sorting algorithm with a similar strategy to insertion sort is selection sort. Instead of going through each element in the left subarray to decide where to insert the current element, we select the minimum element from the right subarray and place it at the right-most index of the left sorted subarray.

Here is the implementation of selection sort in C:

void selectionSort (int arr[], int size) {
    for (int i = 0; i < size - 1; i++) {
        int min = i;
        for (int j = i + 1; j < size; j++) {
            if (arr[j] > arr[min]) {
                min = j;
            }
        }
        if (min != i) {
            int temp = arr[i];
            arr[i] = arr[min];
            arr[min] = temp;
        }
    }
}

Like the previous two sorting algorithms, selection sort has a space complexity of $O(1)$ and a time complexity of $O(n^2)$ because we check approximately $\frac{n}{4}$ pairs on average for $n$ times. It is preferred when swapping elements is costly, as it requires fewer swaps compared to the other two algorithms.

Merge Sort

The above sorting algorithms all have a time complexity of $O(n^2)$ , which is too slow for practical use with large arrays. Merge sort, however, is faster due to its divide and conquer strategy. It works by dividing the array into smaller subarrays and then merging them by comparing each element in the subarrays. For example, when sorting the array {2, 1, 4, 3}, you first divide it into {2, 1} and {4, 3}, then further divide it until each subarray contains only one element.

{2, 1, 4, 3} -> {2, 1}, {4, 3} -> {2}, {1}, {4}, {3}.

Next, you merge the arrays by comparing the smallest elements in each array and selecting the smallest:

{2}, {1}, {4}, {3} -> {1, 2}, {3, 4} -> {1, 2, 3, 4}.

Here is the implementation of merge sort in C:

void merge (int arr[], int l, int m, int r) {
    // Create left arr and right arr from the indices
    int lArr[m - l + 1];
    int rArr[r - m];
 
    for (int i = 0; i < m - l + 1; i++) {
        lArr[i] = arr[l + i];
    }
    for (int j = 0; j < r - m; i++) {
        rArr[j] = arr[m + 1 + j];
    }
 
    // Insert smaller element from lArr or rArr
    int i = 0;
    int j = 0;
    int insertedUntil = l;
 
    while (i < m - l + 1 && j < r - m) {
        if (lArr[i] < rArr[j]) {
            arr[insertedUntil] = lArr[i];
            i++;
        } else {
            arr[insertedUntil] = rArr[j];
            j++;
        }
        insertedUntil++;
    }
 
    // Put remaining elements
    while (i < m - l + 1) {
        arr[insertedUntil] = lArr[i];
        i++;
        insertedUntil++;
    }
 
    while (j < r - m) {
        arr[insertedUntil] = rArr[j];
        j++;
        insertedUntil++;
    }
}
 
void mergeSort (int arr[], int l, int r) {
    if (l < r) {
        int m = l + (r - l) / 2 // to prevent integer overflow
 
        // Recursive call
        mergeSort(arr, l, m);
        mergeSort(arr, m + 1, r);
 
        merge(arr, l, m, r);
    }
}

To understand the time complexity, consider the number of times the array is split and merge is called. The number of splits, $s$ , is related to the size of the array by the relationship $n = 2^s$ , so $s = \log_2(n)$ . The number of splits grows logarithmically as the number of elements increases. For each split, merge combines two subarrays by going through each element in both, which takes linear time, $O(n)$ . Since $O(n)$ operations are performed $O(\log n)$ times, the time complexity of merge sort is $O(n \log n)$ , which scales much better than the other sorting algorithms.

The disadvantage of merge sort is its space complexity, as it requires creating copies of the left and right subarrays during each merge, resulting in $O(n)$ space complexity. However, merge sort is often preferred because memory is relatively inexpensive, and merge sort can be easily parallelized (which will be covered in the next article).

Quick Sort

Quick sort is the trickiest algorithm to explain, but I'll do my best. Quick sort involves choosing a pivot element, either randomly or deterministically (commonly the last element), comparing every other element to the pivot, and placing all elements smaller than the pivot to the left and the rest to the right. This process is repeated recursively for the smaller subarrays (partitions) on the left and right of the pivot until each partition contains only 1 or 0 elements.

When creating a partition, we use two pointers, i and j, that move according to certain rules. Initially, i is set to -1, and j is set to 0. As we increment j, if the jth element in the array is smaller than the pivot, i is moved one step to the right (incremented by 1), and the jth element and ith element are swapped. Otherwise, we do nothing and increment j. This ensures that all elements to the left of i are smaller than the pivot.

When j reaches the pivot at the end, we swap the i+1th element with the pivot. This guarantees that the partition to the left contains values smaller than the pivot, and the partition to the right contains values larger than the pivot.

Here is the implementation of deterministic quick sort in C:

// this creates partition using two sliding windows and return the index of pivot
int partition (int arr[], int l, int r) {
    int pivot = arr[r]; // choose new pivot
    int i = l - 1;
 
    // move i and j with the rules
    for (int j = l; j < r - 1; j++) {
        if (arr[j] < pivot) {
            i++;
            int temp = arr[i];
            arr[i] = arr[j];
            arr[j] = temp;
        }
    }
 
    int temp = arr[i + 1];
    arr[i + 1] = arr[r];
    arr[r] = arr[i + 1];
 
    return i + 1;
}
 
void quickSort (int arr[], int l, int r) {
    if (l < r) {
        int pivot = partition(arr, l, r);
 
        quickSort(arr, l, pivot - 1); // left partition
        quickSort(arr, pivot + 1, r); // right partition
    }
}

Notice that this implementation does not create new arrays for sorting (in-place sorting), resulting in a space complexity of $O(1)$ on average. When the pivot is close to the median value, we can continue making partitions with roughly half the size of the original array. Therefore, like merge sort, the time complexity is $O(n \log n)$ in the best and average cases. (For each partition, we only iterate through it once using i and j, so the complexity is $O(n)$ .)

However, consider the cases where the array is already sorted or reverse sorted. In these cases, the pivots will always be the largest or smallest element, meaning we need to make partitions $n$ times. Thus, in the worst-case scenario, the space complexity will be $O(n)$ due to the recursive stack space, and the time complexity will be $O(n^2)$ . This explains why random pivots are used; the likelihood of consecutively picking the worst possible pivot is extremely low, regardless of the initial order of the array.

Summary

From all of the above, the following summarizes the space and time complexities of all the sorting algorithms.

Space & Time Complexities of Sorting Algorithms
Algorithm	Space	Time
Bubble Sort	$O(1)$	$O(n^2)$
Insertion Sort	$O(1)$	$O(n^2)$
Selection Sort	$O(1)$	$O(n^2)$
Merge Sort	$O(n)$	$O(nlog(n))$
Quick Sort	$O(1)$ (Worst: $O(n)$ )	$O(nlog(n))$ (Worst: $O(n^2)$ )

Although some algorithms dominate others in space and time complexities in all cases, depending on the size of the array and other factors, you might want to choose a different algorithm. For example, if the array is expected to be small and already almost sorted, you might choose insertion sort. Therefore, it is important to test different solutions and decide which one to use in your specific context. Additionally, there are other sorting algorithms that leverage the nature of different data structures, like heap sort, which I will cover in a future article.