Problem 1
Given the array BWT[1:n], write the pseudocode to compute the array rank[1:n] in θ(n) time , assuming that the alphabet size is constant.
T=“rutabaga”
T’=“rutabaga$“
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
|---|---|---|---|---|---|---|---|---|---|
| BWT | a | g | t | b | a | a | $ | u | r |
| rank | 2 | 6 | 8 | 5 | 3 | 4 | 1 | 9 | 7 |
def BWT_to_array_rank(char[] BWT):
seen = set()
char_arr = []
int rank[BWT.length]
for ch in BWT: # O(n)
if ch in seen:
continue
# else...
seen.add(ch)
char_arr.append(ch)
# Since vocab is constant, O(1)
lex_order = sorted(char_arr)
counter = 1
for ch_order in lex_order: # O(1)
for i, ch_bwt in enumerate(BWT): # O(n)
if ch_bwt == ch_order:
rank[i] = counter
counter++
return rank Code could use hash map to improve performance in finding characters in the second pass, but performance will still fundamentally be
Problem 2
Given the arrays BWT[1:n] and rank[1:n], write pseudocode to compute T’ in θ(n) time.
def BWT_and_rank_to_T(char[] BWT, int[] rank):
char res[];
target = BWT.index('$') # O(n)
while len(res) < len(BWT): # O(n)
res.append(BWT[target]) # O(1)
target = rank[target] # O(1)
T_backwards = ''.join(res) # O(n)
T_prime = T_backwards[::-1] # O(n)
return T_primeCode is
- Initial
pass to find the index of the terminator - This is followed by n iterations of
passes through the BWT and rank arrays traversing the string. total. - We finish by doing two
passes to prepare T’ for output, constructing and reversing our output string. Could be optimized, but stays