This description is excerpted with permission from a message Rocke Verser sent me in April 1997.
- Darrell Kindred <dkindred@cs.cmu.edu>
[Darrell Kindred's bitslice DES info]
LoopA: Each pass through LoopA performs 256 complementary pairs of keys. (512 keys per iteration of LoopA.) LoopB: Each pass through LoopB performs 16 complementary pairs of keys. (32 keys per iteration of LoopB.) This loop is arranged so that the first 1+5/8 rounds only need to be computed once. These first 1+5/8 rounds are invariant within LoopC. LoopC: Each pass through LoopC tests 1 key. Each pass through this loop corresponds to a pass through the "PrecomputeRound1" function. Since the last 1+6/8+1/8 round are precomputed, they do not need to be recomputed in LoopC. LoopA: /* Perform a set of 256 complementary pairs of keys (512 keys per set) */ Call Precompute-Round-1 LoopB: /* Perform a set of 16 complementary pairs of keys (32 keys per set) */ Call Precompute-Round-16 LoopC: /* Test a single key */ Fetch the "primary" or "complementary" result from "precompute-round-16" Apply S-boxes 2, 3, 7 of Round 15. Apply Rounds 14, 13, 12, 11, 10, 9, 8, 7, and 6. Apply S-box 7 of Round 5. Compare 4-bit result of Round 5 with stored result of Round 3 from "precompute-round-1". (I.e., compare S-box 7 outputs.) If no match, iterate LoopC. [dk: this abort rarely happens in bitslice: 1.6% for 64-bit, 12.7% w/ 32] Apply S-boxes 3 and 4 of Round 5. Apply S-boxes 1 and 7 of Round 2 to previously stored result. Apply S-boxes 3 and 4 of Round 3 to previously stored result. Compare 8-bit results of Round 5 with 8-bit result of Round 3. (I.e., compare S-box 3 and 4 outputs.) If no match, iterate LoopC. /* We get here, on average 1 time in 4096 keys */ /* [dk: with 64-bit bitslicing, we get here 1.6% of the time] */ Apply S-boxes 1, 2, 5, 6, and 8 of Round 5. Apply S-boxes 1, 2, 5, 6, and 8 of Round 3. Compare 20-bit results of Round 5 with 20-bit result of Round 3. (I.e., compare S-box 1, 2, 5, 6, and 8 outputs.) If no match, iterate LoopC. /* We get here, on average 1 time in 2^32 keys. */ /* This is a "half-match". It must be logged and reported to the keyserver. This is how the keyserver can inexpensively check that clients are actually doing their work!!! Very important!!!! What is reported to the server is the loop counter (0 to 2^31) at the time the half-match was detected. The loop counter simply increments sequentially (not gray-code) with each iteration of LoopC. */ Apply Round 4. Compare 32-bit output of Round 4 with 32-bit result stored result of Round 2. If a match, we probably have the winning key -- a 64-bit match! Iterate-LoopC: Process keys in same order as table generated by "precompute-Round-1". [4 key bits are toggled to every possible value. Complements of tested on alternate passes through the loop.] End-LoopC Iterate-LoopB: Toggle key bits 12, 7, 43, and 52 in gray-code order (bit 52 toggling most frequently). End-LoopB Print "." every 2^21 pairs of keys tested. Print CR/LF every 2^27 pairs of keys tested. Iterate-LoopA: Toggle remaining key bits in gray-code order (bit 63 toggling most frequently) until entire keyspace is tested. End-LoopA ========================================================================= Precompute Round 16 and part of Round 15: Precompute and store 2 sets of values. Each value is used 16 times as we process the keys in this test. 1. Perform Round 16 on ciphertext or complement of ciphertext. 2. Save outputs. [32 bits of the output will require a 2-element array.] 3. Perform S-boxes 1, 4, 5, 6, and 8 of Round 15. 4. Save outputs. [20 bits of the output will require a 2-element array.] 5. Holding all key bits constant, we perform these operations once using the primary ciphertext. And then with the complementary ciphertext. ========================================================================= Precompute Round 1 and part of Rounds 2 and 3: Precompute and store 32 sets of values. Each value may be used 16 times as we process the 512 keys in this set. 1. Perform Round 1 on plaintext or complement of plaintext. 2. Save outputs. [32 bits of the output will require a 32-element array. 32 bits of the output will require a 2-element array. (bitslice code may not need the 2-element arrays. They are simply permuted copies of 32 bits of plaintext and their complement.] 3. Perform S-boxes 2, 3, 4, 5, 6, and 8 of Round 2. 4. Save outputs of these 6 S-boxes. [24 bits of the output will each require a 32-element array.] 5. Perform S-box 7 of Round 3. 6. Save outputs of this S-box. [4 bits of the output will each require a 32-element array.] 7. Holding all other key bits constant, this table shows which bits are toggled (bits 60, 51, 54, 58 count in gray-code order; the tables of precomputed values alternately store the results of using "primary" then "complementary" plaintext. 60 51 54 58 C <-- Key bit (1-64) and "complementary" flag ------------------------ 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 0 0 1 1 0 0 0 1 1 1 0 0 1 0 0 0 0 1 0 1 0 1 1 0 0 0 1 1 0 1 0 1 1 1 0 0 1 1 1 1 0 1 0 1 0 0 1 0 1 1 0 1 0 0 0 0 1 0 0 1 1 1 0 0 0 1 1 0 0 1 1 1 0 1 0 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 0 1 1 1 0 1 1 0 1 0 0 1 0 1 0 1 1 0 1 1 0 1 0 1 1 1 1 0 0 1 0 1 0 0 1 1 1 0 0 0 0 1 0 0 0 1 ========================================================================