Project Proposal:
Suffix Array Construction on GPU
Summary
We are going to implement the Suffix Array Construction -- skew algorithm proposed by Juha Karkkainen and Peter Sanders on GPU.
We plan to build it as a problem based benchmark for GPU and compare with Guy's parallel benchmark.
Background
Suffix array is a sorted array of all suffixes of a string. This indexing data structure is widely used in data compression algorithms and Biological DNA sequences. There are many suffix array construction algorithms such as Doubling Algorithm, Recursive algorithms (Farach 97) and DC3/skew algorithm.
Doubling Algorithm is a naive approach to find prefixes that honor the lexicographic ordering of suffixes. The assessed prefix length doubles in each iteration of the algorithm until a prefix is unique and provides the rank of the associated suffix. Farach's Recursive algorithm is to recursively sort a subset of suffixes then merged to get the final suffix array. The contribution of Farach's algorithm is that it works for integer alphabets.
Among those, DC3/skew algorithm of Karkkainen and Sanders (2003) is most suitable to be implemented in GPU according to Mrinal13's analysis, because all its phases can be efficiently mapped to a data parallel hardware. It runs in linear time and has successfully been used as the basis for parallel and external memory suffix array construction algorithms. The key idea of DC3/skew algorithm is shown in the following graph. In Guy Blelloch's Problem Based Benchmark Suite, Suffix Arrays based on DC3/skew algorithm have the benchmark in both serial and parallel mode.
We plan to implement the skew suffix array construction algorithm in GPU platform and compare with Guy's parallel benchmark performance.
Challenge
Challenges would exist in finding out the part of algorithm which can be parallelized and minimize the sequential execution code.
The data of this problem is a string and most accesses are consecutive, however the data cannot fit in one GPU block memory, so how to leverage the locality of memory is very tricky.
The algorithm itself involves sorting and merging, and how to parallelize these process is also challenging.
Resources
Karp, Richard M.; Miller, Raymond E.; Rosenberg, Arnold L. (1972). "Rapid identification of repeated patterns in strings, trees and arrays". Proceedings of the fourth annual ACM symposium on Theory of computing - STOC '72. p. 125.
Farach, M. (1997). "Optimal suffix tree construction with large alphabets". Proceedings 38th Annual Symposium on Foundations of Computer Science. p. 137.
Karkkainen, Juha, and Peter Sanders. "Simple linear work suffix array construction." Automata, Languages and Programming. Springer Berlin Heidelberg, 2003. 943-955.
Mrinal Deo and Sean Keely. 2013. Parallel suffix array and least common prefix for the GPU. In Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP '13). ACM, New York, NY, USA, 197-206.
Start code : Problem Based Benchmark Suite, Suffix Arrays (SA),
Start Code
Workload : A trigram string of length n=10,000,000; chr22.dna; etext99;
Dataset
Goals/Deliverables
PLAN TO ACHIEVE:
Our first goal is to implement the Skew Suffix Array algorithm on GPU. And then we want to optimize it to achieve a reasonable speedup on GPU.
In the presentation, we also plan to present some graphs of speedup and execution time vs. the existing CPU-based implementations.
HOPE TO ACHIEVE:
And if possible, we can demo some Biological DNA applications based on our algorithm.
Platform
Tool: CUDA - skew algorithm's phases can be efficiently mapped to a data parallel hardware.
Environment: GHC machines with Nvidia GPU - We are going to use the GHC Nvidia GPUs to test how efficient our code is parallelized.
Proposed Schedule
| Week |
What We Plan To Do |
| Apr 11-14 | Get familiar with Suffix Array construction algorithms and code |
| Apr 15-20 | Go deep into the Skew parallel code in Guy's benchmark |
| Apr 21-30 | Implement Skew algorithms in CUDA |
| May 1-9 | Compare with Guy's parallel benchmark and make optimizations |
| May 10-13 | Final report and demo |