Case Study
Overview and Cryptanalysis of the DvD Contents Scrambling System and DeCSS

By Jason Cherry

8 March, 2001

Abstract

I intend to present an overview of the DvD Contents Scrambling System (CSS). The purpose of this overview is to then construct an attack tree [14] of the attacks used, which could be used for developing a tool to test other proprietary (unknown) encryption algorithms.

The CSS was put in place to prevent DvD disks made for one region of the world being played in another region. It also prevented copying of DvD content, whether legal or illegal, from the DvD to any other media. It also prevented (due to the non-disclosure of the algorithm to the public) viewing of movies and other content on the DvD Disks on software or hardware DvD players not approved by the Copy Control Association (CCA).

The breaking of the DvD CSS has enabled disks created for one region to be viewable in any other, on open source DvD players. And, as a side effect, the copying (both legal and illegal) of disk content.

Introduction: The Controversy around DeCSS

The DeCSS (Decrypt CSS, or the ‘De’ is like the ‘De’ in Decipher as against cipher) is surrounded by controversy. There are sides for and against DeCSS. The against side is represented mainly by the "Motion Picture Association of America" (MPAA) and the for side is represented mainly by the LiViD community. It was to the LiViD website that the original "anonymous" email posting of the DeCSS algorithm was sent.

The MPAA cites its main objection to the creation and distribution of the DeCSS algorithm as breach of copyright because DeCSS allows the content of DvD’s to be copied to a computer’s hard disk. A secondary, but equally important, concern to the movie industry is the bypassing of the region codes as a byproduct of decrypting the DvD contents. The MPAA says "Regional DvD coding was devised to protect the theatrical distribution market for motion pictures in international markets." [4]

The "hacking community" at large believes the existence of the DeCSS algorithm does not contravene copyright laws by its mere existence. They also believe the region coding on the DvDs allows the movie industry to charge inflated prices outside of the United States. They believe the movie industry does not want the consumer to enjoy the benefits of the global economies as the movie industries itself. They say that books and CDs may be bought on holiday or over the Internet between the US and Europe or Australia, so, why not DvDs?

Some Hacking sources say the original reason the hacking and open source communities came up with the crack for the CSS algorithm was to enable the creation of an open source DvD reader. Thus enabling users of open source operating systems such as Linux to be able to view DvDs on their own PCs.

There are many arguments for and against DeCSS but it does inarguably now exist, and cannot be wiped out of existence. The CSS was a complex algorithm, and the fact it was broken by volunteers over a period of around twelve months indicates the strength and commitment of the hacking community.

This paper, however, discusses the CSS algorithm itself. I’ll attempt to explain the strength and weakness of the algorithm and then produce an "Attack Tree" as outlined in Schneier’s paper [14].

 

The cost of the official documentation

I have not been able to obtain the "official" documentation as the price tag of $5000 US Dollars for the first set of books, and $500 US Dollars for each additional set is prohibitive. This investigation has led me to some very interesting web sites, and would have led to a lot more, but 90% of the sites I attempted to visit have removed the DeCSS links after legal threats from the MPAA via the CCA.

DvD 101- A background in DvD Technology.

The video information stored on a DvD is stored in a file system called "The Micro UDF file system". The main item of importance is that the files are stored in directories and subdirectories, not dissimilar to that of the DOS file system. And it is easy to implement, which precludes the need for a complex operating system in the DvD players.

All the files must be stored sequentially in a specific order on the disk for a DvD player to accept the disk. All video content on a DvD is found in a directory called "VIDEO_TS". There are three types of files contained in this directory. They are ‘.VOB’ files which contain the video and audio data in MPEG-2 or MPEG-1 formats. ‘.IFO’ files contain the information on how and when to play the data in the .VOB files. And finally ‘.BUP’ files contain a backup of the data in the .IFO files, as this information is critical for the DvD to remain playable. The files are always stored in the following order .IFO, .VOB1….,VOBn, .BUP.

On every DVD-Video a Video Manager (VMG) has to be present. The VMG consists of an IFO file (VIDEO_TS.IFO) which contains information that is global to the disc, for example in which region it may be played. Also it may contain information on how to display an optional menu for which the data is stored in VIDEO_TS.VOB. This is the menu the player shows first when a DVD-Video is inserted. Most menus allow jumping to specific scenes of a movie, subtitle selections and access to additional or deleted scenes etc.

At least one video title set has to be present on each DVD-Video. A video title contains the actual data to be presented, i.e. a movie. The different video titles are stored in files named VTS_xx_y.VOB, where "xx" is the title number from 01 to 99 and "y" a number from 0 to 9. MicroUDF limits the file size to 1 GB; movies have to be stored on more than one file. The VTS_xx_y.IFO file provides all the information about the video and audio format of the corresponding VOB.

 

Contents Scrambling System Overview.

The information in this section is largely based on a paper by Frank A. Stevenson "Cryptanalysis of Contents Scrambling System". For clarification of information represented in the paper I have referred to the original source code posted to the LiViD Developers list.

The CSS System comprises two entities: the DvD Player, and the DvD Disk. The ultimate aim of the system is for the DvD Player to Authenticate itself to the DvD. Find out which Disk Key (DA) the Disk has its title keys (DB) encrypted with. Then decrypt the title keys. And for each title key decrypt the associate ‘.VOB’ file.

The DvD Player contains a number of Player Keys, from one to 409 (KpN, N = 1…409). Each player key is 5 bytes long. Which give a key length of 40 bits.

The DvD Disk has the following.

dkN, which is the Disk Key encrypted with player key N (KpN). The DvD only has one of these.

Hash of length 40 bits, which is the Disk Key (Kd) encrypted with the correct Kd, using the CSS Algorithm in encryption mode A (DA)

tk, which is the Title Key encrypted with Kd, using CSS Algorithm in encryption mode B (DB).

Examining "Flow chart #1: Basic Content Scrambling System Algorithm" it can be seen the process is as follows.

The DvD Player uses equation (1) and the first of its player keys KpN in an attempt to decrypt the Disk Key Kd.

Equation (1) Kd = DA(dkN, KpN) => This means decrypt dkN with KpN using the CSS Algorithm in mode A.

The player then checks its decrypted value of Kd to ensure it is the correct one, by attempting to decrypt the hash contained on the disk using equation 2.

Equation (2) Kd = DA(hash, Kd) => Decrypt hash with Kd using CSS Decryption mode A.

If the Kd is the valid one, the hash will decrypt to Kd itself. If the decrypted hash is not Kd, the DvD player tries the next Player Key available. It continues checking until it finds a matching Kd, hash pair. If none is found then the DvD player cannot read the disk!

When a matching Kd, hash pair is found the DvD Player can then decrypt the title keys contained on the DvD Disk using Kd and the CSS Algorithm in mode B (See Equation 3 below).

Equation (3) Kt = DB(tk, Kd) => Decrypt tk with Kd using the CSS Decryption mode B.

The contents of the DvD may now be decrypted using the Title Key Kt found using equation 3.

Each sector of the data files is optionally encrypted with a key derived from Kt by an exclusive OR of specified bytes from the unencrypted first 128 bytes of the 2048 byte data sectors.

 

Figure 1: Content Scrambling System Algorithm Flow Chart

A more in depth look at the CSS Stream Cipher Primitive.

The CSS Stream Cipher is made up of two Linear Feedback Shift Registers (LFSRs). The LFSRs are clocked 8 times per byte output. There are four ways to combine the output of the two LFSRs (modes of operation) which is accomplished by two inverter switches. The modes of operation are further

The modes of operation are as follows.

  1. Authentication of the DvD Player to the disk.
  2. Decryption of the Disk Key (Decryption mode A).
  3. Decryption of the Title Key (Decryption mode B).
  4. Decryption of the Data Blocks.

The CSS Stream Cipher is shown in diagrammatic form in Figure 2.

Click for full size

Figure 2: CSS Stream Cipher

In comparison to the Alleged A5 Stream Cipher [6] it is relatively simple. The A5 Stream Cipher has three LFSRs and stop start mechanisms to further confuse its output.

LFSR1 is 17 bits with two taps at 15, and 1. It is initialised with the first two bytes of the key material. The most significant bit is set to 1, to prevent null cycling.

LFSR2 is 25 bits wide, with four taps, at 23, 20, 19 and 11. ([18] by code examination). LFSR2 is initialised with bytes 3, 4 and 5 of the key shifting all but the least significant bits up one position (Thus bit four would be zero due to the shifting). And then bit four is set to 1 to prevent NULL cycling.

The two outputs are combined with carry to produce the CSS Stream Cipher output. The carry is added to the next output byte See equation (4).

If the output of LFSR1 is O1(j) where j = 1 to N.

And if the output of LFSR2 is O2(j) where j = 1 to N.

Then the combined output is …

Equation (4) O(j) = O1(j) + O2(j) + carry, where carry is the carry bit from O(j-1).

When the CSS Stream Cipher is being used in either "Decryption of the Disk Key" or "Decryption of the Title Key" modes, the input data is mangled in a further step using a mangling function (Figure 3: CSS Mangling Function) before being exclusive ORed with the output of the Stream Cipher. This mangling function is not used when decrypting the data blocks on the DvD. [5]

 

The Mangling Function

The Mangling function is represented by the diagram in Figure 3.

Figure 3: CSS Mangling Function

The inputs enter the mangling function from the top of the diagram at ‘A’ (bytes 1 to 5) and are output at the bottom of the diagram at ‘C’ (bytes 1 to 5).

Level ‘B’ (bytes 1 to 5) is an intermediate stage, which is necessary for a better understanding of the algorithm.

The mangling function handles 40 bits or 5 bytes of data at a time.

Kj = O(j) where j = {1, 2, 3, 4 and 5} the five initial output bytes from the CSS Stream Cipher.

The equations for the Mangling function are as follows [16].

B(j) = XOR( F( A(j) ), A(j-1), kj) for j = { 2, 3, 4, 5 }

B(1) = XOR( F( A(1) ), B(5), k1)

C(j) = XOR( F( B(j) ), B(j-1), kj) for j = { 2, 3, 4, 5 }

C(1) = XOR ( F(B(1)), k1)

Where F is a permutation of a byte, and is defined in a byte permutation table. (Not shown here, see [5]).

Authentication of the Host (Computer Software) to the DvD Player (LU)

The authentication of the Host (Computer Software) to the Logical Unit (DvD Drive) is quite complex. As can be seen in Figure 4, there are six Steps.

I’ll explain each in turn.

(1). Initialise communications with the DvD Player (Player). The Player sends back some information confirming it is able to communicate.

(2). If the initial communication with the Player is OK, the Host sends nine bytes of challenge data to the Player.

(3). The DvD Player responds to the challenge data sent to it, by sending the host a key, Key1. Key1 is generated using the CSS Algorithm in one of 32 modes. The host (computer software) examines Key1 and ensures it can also generate Key1 from the challenge data sent. If the generated and received keys match, then the Player has been authenticated successfully to the Host.

(4). The Player sends the Host its challenge data. The Host then encrypts this challenge data using the CSS Algorithm.

(5). The Host sends the Player the generated key, Key2.

(6). The Player responds with either "Authentication Established", or "Authentication Failure". If Authentication is established communication can continue between the DvD Player and the Computer Software.

 

Figure 4: Host (Computer Software) to Player (LU) Authentication protocol

Attacks on the CSS Stream Cipher (Building an Attack Tree).

In this section I’ll outline the attacks that have been used on the CSS Stream Cipher thus far, and attempt to build an Attack Tree, which could be used to cryptanalyse and evaluate other such proprietary systems, and hence be added to a cryptanalyst’s tool.

There are five attacks on the CSS Stream Cipher and associated components of the system thus far. The attacks can be categorised as follows with the level of difficulty for each attack indicated (2x).

Brute Force Attack on the Hash of the Disk Key (240).

Once the CSS Stream Cipher had been uncovered and it was realised that 40 bit keys were in use (due to export restrictions in the US), a "Brute Force" Attack on the Disk Key Kd was possible. With a 40-bit key there are 240 possible combinations. Code was written which could try 17e6 potential hashes/second [2] on a Celeron 366 (a slow machine by today’s standards). This gave a search time of approximately 17 hours. It was found that the hash function (i.e. the CSS Cipher) is not a bijection, thus a number of ‘valid’ keys could be found. At the time it was recommended to the reader (of the source code) to try multiple DvDs to ensure they had the correct key. [2]

Thus with the known Disk Key, and the CSS Cipher output known, it was possible to read the content of DvDs.

 

Attacking the CSS Stream Cipher.

Seventeen hours is still a fairly long time, so the challenge was put out to improve on the 240 combinations and thus completely ‘break’ the CSS Algorithm.

Attack 1: Six output bytes are known (216).

The aim of this attack is to find the initial states of LFSR1 and LFSR2.

Examining fig 1 it can be seen If the output bytes O(j) where j = {1, 2, 3, 4, 5, and 6} are known. An attack of 216 is possible. Because the MSB of LFSR1 (width 17 bits) is set to one on initialisation, we only have to guess the remaining 16 bits.

This is done the in following way.

  1. Guess the initial state of LFSR1, up to 216 (65536) times.
  2. Clock out 4 bytes, O2(1), O2(2) , O2(3) , O2(4) can be uniquely determined.
  3. From the bytes above the state at j = 4 is known.
  4. LFSR1 can be verified by clocking out 2 or more bytes of the cipher and comparing the result.
  5. Upon finding the initial Start State of LFSR1 and LFSR2 we have the original key material, as the keys provide the initialisation state for the Stream Cipher.

 

Attack 2: Five output bytes are known (216).

The aim of this attack is to find the initial states of LFSR1 and LFSR2.

Examining fig 1, once again, if output bytes O(j) where j = {1, 2, 3, 4, and 5} are known. An attack of around 216 is once again possible. It is done in the following way.

  1. Guess the initial state of LFSR1, up to 216 (65536) times.
  2. Clock out 3 bytes. O2(1), O2(2) and O2(3) can be found as in Attack 1. All the bits of LFSR2 are revealed except for the MSB of LFSR2 (Remember it is 25 bits, 3 bytes are 24 bits).
  3. Try both alternatives for the MSB.
  4. Clock LFSR2 backwards 24 steps (i.e. one bit at a time).
  5. LFSR2 should now be at state 1 (j=1). The value of bit 3 (assuming we start from bit 0) should be set to one, as it is set at initialisation.
  6. Clock out two more bytes to verify the guess for LFSR1.
  7. Upon finding the initial Start State of LFSR1 and LFSR2 we have the original key material, as the keys provide the initialisation state for the Stream Cipher.

Known Cipher Text and Known Plain Text attack on the Mangling Function.

The aim of this attack is to get k1 to k5 and thus recover five output bytes of the CSS Stream Cipher, which can be used in the CSS Stream Cipher Attacks.

Examining the Mangling Function Diagram Figure 3.

For a known Cipher Text and Known Plain Text attack on the Mangling Function the steps are as follows.

  1. Guess k5 (See Diagram), up to 28 (255) times.
  2. Make the following calculations in the order shown,

    B(5) = XOR( F( A(5) ), A(4), k5)

    B(4) = XOR( F( B(5) ), C(5), k5)

    k4 = XOR( F( A(4) ), A(3), B(4))

    B(3) = XOR( F( B(4) ), C(4), k4)

    k3 = XOR( F( A(3) ), A(2), B(3))

    B(2) = XOR( F( B(3) ), C(3), k3)

    k2 = XOR( F( A(2) ), A(1), B(2))

    B(1) = XOR( F( B(2) ), C(2), k2)

    k1 = XOR( F( A(1) ), B(5), B(1))
  3. Verify the guess on k5 by checking and calculating C(1) = XOR( F( B(1), k1)

If k5 verifies, then we have recovered five output bytes of the CSS Stream Cipher.

 

Better Attack on the reversible Disk Key Hash (225).

The aim of this attack is to recover the Disk Key in better than brute force time.

Examining Figure 3, the attack follows the steps below and has a complexity of 225 (33,554,432).

I’ll first point out that the data is mangled via the CSS Mangling function after it has been XORed with the output of the CSS Stream Cipher, thus C(1) to C(5) are the five demangled output bytes (80 bits) of the hashed key! k1 to k5 are the five initial bytes output from the CSS Stream Cipher. A(1) to A(5) are the hash bytes of the Disk Key.

Hash = XOR(CSS Stream Cipher 5 bytes output, Mangling Function Output)

The aim of this attack is to get k1 to k5, which will be the decrypted hash of the Disk Key, i.e., the Disk Key will be revealed!

Figure 5: Building the table of values for C(2) vrs B(1) for values of k1

  1. Build a table of values of k2 indexed by C(2) and B(1). Figure 5 shows the relevant part of the Mangling diagram we are concerned with.
    This is done by calculating the following.

    k2 = XOR( F( A(2) ), A(1), B(2) ), where B(2) is 0 to 255.

    C(2) = XOR( F( B(2) ), B(1), k2), k2 and B(2) are calculated above in the previous calculation, and B(1) is from 0 to 255.

    Now a table of values of B(1) verses C(2) for k2 can be built.

    Note: There can be up to eight values of k2 for each B(1), C(2) pair.
  2. Guess the value of LFSR1, Calculate the first five output bytes of LFSR1.
    (O1(j) where j = 1 to 5).
  3. Guess B(1) and calculate the following

    k1 = XOR( F(B(1)), C(1)), C(1) and C(2) are known, as they are the start state of LFSR1.

    B(5) = XOR(F(A(1)), B(1), k1)

    k5 = XOR(F(A(5)), A(4), B(5))
  4. As there may be more than one value of k2 for the B(1), C(2) pair,
    calculate all possible values of O2(1), O2(2) and 2 possible O2(5) (remember MSB is in unknown state).

    O2(n) = XOR(O1(n) + kn)
  5. For every legal initial state of LFSR2, there exists a one to one mapping to O2(1,2,5) [16]. By generating a table with 224 entries the start state of the LFSR2 can be found. At this point we have guessed the value of LFSR1, and worked out a value for the initial starting state of LFSR2. Thus C(1) to C(5) have potentially been recovered.
  6. Next calculate B(4), k4, B(3), k3, and B(2) as in step two of "Known Cipher Text and Known Plain Text attack on the Mangling Function".
  7. Verify and calculate k2 = XOR(F(A(2)), A(1), B(2)). If this test holds then C(1) to C(5) have been found, which is the decrypted Disk Key.

This attack was reputed to take 18 seconds on a Pentium III 450Mhz. [16]

 

Building the Attack Tree

The basic idea of an attack tree is to present all possible actions to take when presented with a given problem. This can be presented either in graphical form, or in indented textual form, like I’ve done below. See [14] for a complete description of Attack Trees.

The Attack trees are used by beginning at the top of the tree and working your way down. If the condition on the level of the attack tree is true or applies then enter the indented text below the current level. Otherwise go to the next line of text with the same indentation.

I’ve created two Attack Trees. The first is someone who has a DvD and would like to view the DvD. Therefore the goal of the first attack tree is "Play a DvD". There are a number of possible options, which range from buying a legal DvD player, to writing your own version of DeCSS.

The second attack tree has a goal of "Decipher secret from a CSS ‘like’ cipher". In this second attack tree I’ve attempted to capture how I think the original breakers of the encryption scheme may have methodically broken down the CSS algorithm and associated components.

Goal: Play a DvD.

Goal: Decipher secret from a CSS ‘like’ cipher.

Summary

The CSS Algorithm has been completely broken. It was intended to provide a cipher with strength of a 40-bit key. The algorithm fell far short of the 40-bit protection offered by the key width. The strength of the cipher has been proven to be only of complexity 216.

A number of lessons can be learned from the events surrounding the CSS algorithm.

The CCA along with the MPAA attempted to move into the field of cryptography which is a very complex field. Upon entering the field of cryptography they broke all of Bauer’s proposed five "Maxims of Cryptology". [3]

Maxim No. 1: One should not underrate the adversary.

The open source hacker community broke the CSS cipher in less than 12 months.

Maxim No. 2: Only a cryptanalyst, if anybody, can judge the security of a crypto system.

Rumor has it from emails on the LiViD website that the CCA had the algorithm analysed by a cryptanalyst at Intel, who informed them the encryption algorithm was weak, but they did not heed the warning.

Maxim No. 3: In judging the encryption security of a class of methods, one has to take into account that the adversary knows the class of methods.

The adversary discovered the algorithm. Because the algorithm was weak, it was duly broken.

Maxim No. 4: Superficial complications can be illusory, for they can provide the cryptographer with a false sense of security.

The CCA and MPAA were fooled by the little understood digital wizardry of digital encryption schemes, into thinking a weak encryption scheme kept secret would guard their DvD secrets.

Maxim No. 5: In judging the encryption security of a class of methods, cryptographic faults and other infringements of security discipline are to be taken into account.

Licensing a software version of the DvD player allowed an easily reversible (to a competent programmer) version of the algorithm into the public space.

Glossary

CCA Copy Control Association

CSS Content Scrambling System

ISO International Standards Organisation

LFSR Linear Feedback Shift Register

LiViD Linux Video and DVD Project

LSB Least Significant Bit. Bit zero.

MPAA Motion Picture Association of America. The MPAA is the trade association for the motion picture industry. The members of the MPAA are: Buena Vista Pictures Distribution, Inc. (The Walt Disney Co., Hollywood Pictures, Touchstone Pictures, Miramax Films Corp.); Metro-Goldwyn-Mayer Studios Inc. (Metro-Goldwyn-Mayer Pictures, United Artists Pictures, Orion Pictures); Paramount Pictures Corporation; Sony Pictures Entertainment Inc. (Columbia Pictures, TriStar Pictures); Twentieth Century Fox Film Corporation; Universal Studios, Inc.; and Warner Bros., a division of Time Warner Entertainment Company, L.P.

MPEG Moving Picture Experts Group. A family of standards used for coding audio-visual information in digitally compressed format.

MSB Most Significant Bit. Bit N where N > 0.

OSTA Optical Storage Technology Association

UDF Universal Disk Format. Compliant with ISO 13346. Intended to enable the interchange of files across multiple operation systems.

Appendices

 

Overview of the IFO file for the VMG:

 

Overview of the IFO file for the VTSs:

 

 

 

Bibliography

  1. Andreas, "[Livid-dev] CSS chain of events?" Posted to livid-dev-admin@livid.on.openprojects.net by andreas@soma.andreas.org
  2. Arron, "[Livid-dev] CSS Key Generation analysis" Posted to the LiViD-Dev group by Arron. 26/10/99
  3. Bauer, F.L. "Decrypted Secrets : Methods and Maxims of Cryptology" Springer-Verlag, Berlin Heidelberg 1997, 2000.
  4. DvD Copy Control Association http://www.dvdcca.org/dvdcca/faq.html
  5. Fawcus, Derek "css_decramble.c" Source code of the DeCSS algorithm 1999.
  6. Golic, Jovan Dj. "Cryptanalysis of Alleged A5 Stream Cipher" presented at Ëurocrypt 97.
  7. http://www.7thzone.com/cgi-bin/page.cgi?id_page=whydvd "DvD Technology"
  8. http://www.gilc.org/speech/DVD-CSS.htm "General DvD information"
  9. http://www.mpeg.org/MPEG/starting-points.html#mpeg "MPEG Information"
  10. http://www.opendvd.org/regioncode.php3 "Region Codes"
  11. http://www.osta.org/html/ostatech.html#udf "UDF Information"
  12. Maxwell, Greg, "[Livid-dev] ‘Better’ player key cracker" Posted to livid-dev@livid.on.openprojects.net
  13. Schneier, Bruce "Applied Cryptography" 2nd Edition, printing 9. Published by Katherine Schowalter. ISBN 0-471-11709-9
  14. Schneier, Bruce "Attack Trees", Published in Dr.Dobb’s Journal December 1999.
  15. Schneier, Bruce "Security in the Real World: How to Evaluate Security Technology", Computer Security Journal, Vol XV, Number 4, 1999.
  16. Stevenson Frank A., frank@funcom.com, "Cryptanalysis of Contents Scrambling System" The original document may be found at http://crypto.gq.nu/.
  17. Stevenson, Frank A. "attack-on-CSS-algorithm.txt", Posted to the "[Livid-dev] list. 27/10/1999.
  18. Wieland, Peter "ntddcdvd.cpp" Source code of the DeCSS algorithm written for Microsoft NT 1999.

 

 

 

 

 

End of Document.