Checkpoint Report:
Suffix Array Construction on GPU
Work completed so far
-
We have studied deep into the serial Skew Algorithm implementation and parallized code by Guy in PBBS.
PBBS skew benchmark Code
-
We identied the parts which could be paralleized in GPU:
- radix sort mod12 based on 3 chars
- generate rank of triplets
- move mod 1 suffixes to bottom half and and mod 2 suffixes to top
- place ranks for the mod12 elements in full length array
- radix sort mod 0 suffixes
- mergesort s0 and s12
-
The parallelzed Skew Algorithm pseudocode on GPU:
(Based on the paper "Parallel suffix array and least common prefix for the GPU" by Mrinal Deo and Sean Keely.)
computeSA(int *T, int *SA, int length)
{
initMod12();
radixSort(s12); // lsb radix 1st Character kernel1
radixSort(s12); // lsb radix 2nd Character
radixSort(s12); // lsb radix 3rd Character
lexicRankOfTriplets(); kernel2
if (!allUniqueRanks)
computeSA(); recursion
storeUniqueRanks(); kernel3
else
computeSAFromUniqueRank(); kernel4
radixSort(s0) kernel1
mergeSort(s0, s12) kernel5
}
-
Progress: We are trying hard to make this code work!
-
We studied the radixSort and mergeSort library in GPU, which is available to be adapted in our code.
Cuda doc on RadixSort and MergeSort
More detailed goals
- First implement the Skew Algorithm on Cuda according to the pseudocode above with the CUDA RadixSort and MergeSort library.
- Then optimized the RadixSort and MergeSort parallel part based on the Merrill10Revisiting and Satish09Designing paper about efficient sorting algorithms for manycore gpus
Demo plan in competition
We plan to show our speedup graph of Skew Algorithm based on the serial Skew Algorithm code and comparison with the PBBS parallel Skew Algorithm code. Moreover, we plan to compare the performance of the Suffix Array based application such as Longest Common Prefix(LCP) with PBBS benchmark.
Preliminary results
No preliminary results so far.
Remaining issues
We are condifient to implement the Skew Algorithm on GPU with the CUDA RadixSort and MergeSort library. Howerver, we are not sure about how far we can go to implement our own RadixSort and MergeSort part on GPU to make improvements. There are some existing research results on optimizing those on GPU. We will try our best to achieve them.
Project schedule
| Week |
What We Plan To Do | What We Actually Did |
| Apr 11-14 | Get familiar with Suffix Array construction algorithms and code | We did this! |
| Apr 15-20 | Go deep into the Skew parallel code in Guy's benchmark | We did this! |
| Apr 21-30 | Implement Skew algorithms in CUDA | We are half way there. With pseudocode ready we are implementing the algorithm. |
| May 1-9 | Compare with Guy's parallel benchmark and make optimizations | |
| May 10-13 | Final report and demo | |