This description is excerpted with permission from a message Rocke Verser sent me in April 1997.
- Darrell Kindred <dkindred@cs.cmu.edu>
[Darrell Kindred's bitslice DES info]
LoopA: Each pass through LoopA performs 256 complementary pairs of
keys. (512 keys per iteration of LoopA.)
LoopB: Each pass through LoopB performs 16 complementary pairs of
keys. (32 keys per iteration of LoopB.) This loop is
arranged so that the first 1+5/8 rounds only need to be
computed once. These first 1+5/8 rounds are invariant
within LoopC.
LoopC: Each pass through LoopC tests 1 key. Each pass through
this loop corresponds to a pass through the "PrecomputeRound1"
function. Since the last 1+6/8+1/8 round are precomputed,
they do not need to be recomputed in LoopC.
LoopA:
/* Perform a set of 256 complementary pairs of keys (512 keys per set) */
Call Precompute-Round-1
LoopB:
/* Perform a set of 16 complementary pairs of keys (32 keys per set) */
Call Precompute-Round-16
LoopC:
/* Test a single key */
Fetch the "primary" or "complementary" result from "precompute-round-16"
Apply S-boxes 2, 3, 7 of Round 15.
Apply Rounds 14, 13, 12, 11, 10, 9, 8, 7, and 6.
Apply S-box 7 of Round 5.
Compare 4-bit result of Round 5 with stored result of Round 3
from "precompute-round-1". (I.e., compare S-box 7 outputs.)
If no match, iterate LoopC.
[dk: this abort rarely happens in bitslice: 1.6% for 64-bit, 12.7% w/ 32]
Apply S-boxes 3 and 4 of Round 5.
Apply S-boxes 1 and 7 of Round 2 to previously stored result.
Apply S-boxes 3 and 4 of Round 3 to previously stored result.
Compare 8-bit results of Round 5 with 8-bit result of Round 3.
(I.e., compare S-box 3 and 4 outputs.)
If no match, iterate LoopC.
/* We get here, on average 1 time in 4096 keys */
/* [dk: with 64-bit bitslicing, we get here 1.6% of the time] */
Apply S-boxes 1, 2, 5, 6, and 8 of Round 5.
Apply S-boxes 1, 2, 5, 6, and 8 of Round 3.
Compare 20-bit results of Round 5 with 20-bit result of Round 3.
(I.e., compare S-box 1, 2, 5, 6, and 8 outputs.)
If no match, iterate LoopC.
/* We get here, on average 1 time in 2^32 keys. */
/* This is a "half-match". It must be logged and reported to
the keyserver. This is how the keyserver can inexpensively
check that clients are actually doing their work!!! Very
important!!!! What is reported to the server is the
loop counter (0 to 2^31) at the time the half-match was
detected. The loop counter simply increments sequentially
(not gray-code) with each iteration of LoopC. */
Apply Round 4.
Compare 32-bit output of Round 4 with 32-bit result stored result
of Round 2.
If a match, we probably have the winning key -- a 64-bit match!
Iterate-LoopC: Process keys in same order as table generated by
"precompute-Round-1". [4 key bits are toggled to every possible
value. Complements of tested on alternate passes through the
loop.]
End-LoopC
Iterate-LoopB: Toggle key bits 12, 7, 43, and 52 in gray-code
order (bit 52 toggling most frequently).
End-LoopB
Print "." every 2^21 pairs of keys tested.
Print CR/LF every 2^27 pairs of keys tested.
Iterate-LoopA: Toggle remaining key bits in gray-code order
(bit 63 toggling most frequently) until entire keyspace is
tested.
End-LoopA
=========================================================================
Precompute Round 16 and part of Round 15:
Precompute and store 2 sets of values. Each value is used 16 times
as we process the keys in this test.
1. Perform Round 16 on ciphertext or complement of ciphertext.
2. Save outputs. [32 bits of the output will require a 2-element
array.]
3. Perform S-boxes 1, 4, 5, 6, and 8 of Round 15.
4. Save outputs. [20 bits of the output will require a 2-element
array.]
5. Holding all key bits constant, we perform these operations once
using the primary ciphertext. And then with the complementary
ciphertext.
=========================================================================
Precompute Round 1 and part of Rounds 2 and 3:
Precompute and store 32 sets of values. Each value may be used 16
times as we process the 512 keys in this set.
1. Perform Round 1 on plaintext or complement of plaintext.
2. Save outputs. [32 bits of the output will require a 32-element
array. 32 bits of the output will require a 2-element array.
(bitslice code may not need the 2-element arrays. They are simply
permuted copies of 32 bits of plaintext and their complement.]
3. Perform S-boxes 2, 3, 4, 5, 6, and 8 of Round 2.
4. Save outputs of these 6 S-boxes. [24 bits of the output will each
require a 32-element array.]
5. Perform S-box 7 of Round 3.
6. Save outputs of this S-box. [4 bits of the output will each require
a 32-element array.]
7. Holding all other key bits constant, this table shows which bits are
toggled (bits 60, 51, 54, 58 count in gray-code order; the tables of
precomputed values alternately store the results of using "primary"
then "complementary" plaintext.
60 51 54 58 C <-- Key bit (1-64) and "complementary" flag
------------------------
0 0 0 0 0
0 0 0 0 1
0 0 0 1 0
0 0 0 1 1
0 0 1 1 0
0 0 1 1 1
0 0 1 0 0
0 0 1 0 1
0 1 1 0 0
0 1 1 0 1
0 1 1 1 0
0 1 1 1 1
0 1 0 1 0
0 1 0 1 1
0 1 0 0 0
0 1 0 0 1
1 1 0 0 0
1 1 0 0 1
1 1 0 1 0
1 1 0 1 1
1 1 1 1 0
1 1 1 1 1
1 1 1 0 0
1 1 1 0 1
1 0 1 0 0
1 0 1 0 1
1 0 1 1 0
1 0 1 1 1
1 0 0 1 0
1 0 0 1 1
1 0 0 0 0
1 0 0 0 1
========================================================================