In this article, I am going to tell about the operating principles and characteristic properties of random number generator (RNG), employed in the DiskCryptor. RNG, - is a critically important security element of a disk encryption system, as it generates the keys with which data is encrypted. And exactly the RNG, is what often is a weak spot or a backdoor in a different sorts of proprietary and government certified encryption systems, as an accuracy of encryption algorithms implementation and the data storage methods can be easily examined even without having an access to the source code, but RNG, however, is much more difficult to analyze. It is possible to create a one-way strong RNG, that will allow for a person, knowing a certain secret, to recover all the keys, that were generated by it and consequently to compromise an encrypted data. A good example of such a case, is the Dual_EC_DRBG standard, which has an NSA backdoor and is employed in Windows Vista.
The DiskCryptor's RNG is aimed to perform the following two objectives: generation of encryption keys and generation of a large volume of random data for disk wiping algorithms. In the first case, it is necessary to provide as big strength reserve, as possible, and the following requirements are being put forward for key generation, for the RNG:
Secondly, for generation of a large volume of random numbers, the following requirements are set for the RNG:
RNG for key generation, being employed in the DiskCryptor, belongs to the accumulative RNG type with multiple sources of entropy. An RNG of this type, accumulates minor entropy values over a long period of time, and afterward it is capable of issuing a truly random sequence in size of up to 640 bytes. Next, pseudo-random sequences follow, until a new accumulation of entropy.
As a basis of the RNG architecture, the SHA-512 hash function and AES-256 block cipher, are taken. The choice of this function is made because of its simplicity, good productivity and high reliability, though in this case, less stringent requirements are set forward for hash, in comparison, for example, when using it for the digital signature. The utilized hash function has to have the following properties:
A simplified block scheme of RNG, looks like this:
A seed is being hashed together with a timestamp and reseed counter. Adding both timestamp and reseed counter, prevents emergence of an identical hash values on a same seed. An obtained hash is added to the random number pool by means of addition with respect to base 28 with existing content of the pool, and that creates a dependence of content of the pool on all random number additions made earlier. The pool has the size of 640 bytes, that allows for the accumulation of the maximum of 5120 bits of entropy. On one addition of a seed, the size of entropy increases by the maximum of 512 bits, therefore the SHA-512 hash size requires no less then 10 reseeds, for fully filling up the pool.
Besides the big pool for data accumulation, RNG also has the second one, the keys pool. This pool has the size of 256 bits and it is never, even indirectly, being outputted, but is used as a key for the AES-256, with which output data is encrypted.
Mixing up the pool, - is an operation, performed with the objective to make all the bits in the pool to become dependent on each other. The goal is to create the reliance of each bit being generated, on all the content of the pool, but at the same time not to decrease the maximum entropy, as would have happened on hashing the whole pool at the moment of issuing random numbers. The mixing operation, has a characteristic of being one-sided, meaning that, knowing a state of the pool after a mixing, it is impossible to restore its previous state. This characteristic is directly based on the corresponding properties of the SHA-512 hash.
Mixing is being done with the help of mix function (the rnd_pool_mix function in the source code). During the mixing process, the pool is being divided into 10 parts, each having the size of 64 bytes, and to each part, a value of SHA-512 hash of all the pool content, is being added. This process can be presented as follows:
pool0 = pool0 + SHA(pool0 || pool1 ... pooln)
pool1 = pool1 + SHA(pool0 + SHA(pool0 || pool1 ... pooln) || pool1 ... pooln)
...
Mixing is being carried out, in either of the following three cases: before generation of a random byte sequence, after it, and in the case when entire content of the pool has been expended in the process of generation, and issuing of data has gone to a second round. The first mixing is required for the creation of dependency of output data, on all the pool content. The second mixing, protects the generated keys from an attacker, who might have an opportunity to read the contents of memory, after generation (for example by performing the cold boot attack). Lastly, the third mixing allows for the use of new entropy, that is being added to the pool in the process of random number generation, and it also creates an additional remoteness between already generated and following random numbers.
The function of random numbers retrieval, receives generator's output data from its internal state. Retrieval of random numbers is done by taking a SHA-512 hash from a part of the pool, random generated blocks counter (getrand counter), and a size of a remaining part of a requested data (remaining length), and encryption of an obtained hash with AES-256 key, received from the keys pool. getrand counter and remaining length are needed to eliminate a possibility of generation of two identical output blocks, if pool contents are the same. A situation having the same content in the pool, is an unlikely one, due to its mixing. Nevertheless, I believe that an additional safety measure, would only be of benefit. The process of generation can be presented as follows:
rand0 = SHA(pool0 || counter || length)
rand1 = SHA(pool1 || counter + 1 || length - 64)
...
randn = SHA(pooln || counter + n || length - 64*n)
In the process of retrieval of every 512 bits of data, twice, a gathering of additional entropy takes place. After reaching the maximum possible n (having the 640 bytes pool size, it equals 10), pool mixing takes place, and the subsequently issued data will not be truly random, but partially pseudorandom. The strength of this construction to withstand a risk of being compromised, having minor values of an effective entropy (that is, pseudorandom numbers), is based on the total strength of SHA-512 and AES-256, that gives a strength reserve even in a case of fully breaking of one of these cryptoprimitives. The source code for number retrieval from the pool, you can find in the function rnd_get_bytes.
Sources of entropy, - is the RNG element, the importance of which is hard to overvalue. It is precisely these sources, that are responsible for an effective entropy, accumulated by RNG, and consequently, an effective entropy of generated keys. Howsoever good the entropy retrieval algorithm might be, but a poor source of entropy can place the RNG security in jeopardy. And vice versa, - a good source of randomness, in some cases, may "stretch" the security of a weak retrieval algorithm. It is ought to approach the sources of entropy with a much greater paranoia, than the RNG algorithm, as the strength of these sources is practically non-provable and can vary by many magnitudes, on different assumptions about an attacker's resources and potential. So therefore, it is prudent to follow the rule of "You can't spoil a good thing. Plenty is no plague." And thus, it is best to use as many different sources of randomness, as possible. Adding another additional source, will not lower the strength, but only might to increase it, in certain cases. Consequently, what is needed here, is - redundancy, redundancy, and more redundancy.
Having redundancy, - is very good, but one should not rely on an empirical value of its size, as it may just happen, that an attacker may be able to read the majority of entropy sources used, and the remaining inaccessible sources he may be capable to pick up or predict. Therefore, there is the need to have at least a simplest numerical basis for an effective size of entropy in relation to an attacker, having a varied capabilities. For the start, we should define what kind of leverage a hypothetical attacker may have. The simplest version, - is when an attacker has absolutely no access to a computer in a moment of random number generation, that he also does not know an exact moment of generation, and that nor he has an information about the state of a system on that moment, either. This model is appropriate for the most likely kind of an attacker, and in this model, the maximum possible security of all the sources of entropy, is reached. One of the versions of strength calculation, will be given with regard to this model, later on. A more complicated version, - is when an attacker has a partial access to a computer and knows an approximate (within the milliseconds range) time of random number generation, and when he also has the access to a part of system information on that moment. Practically, this model corresponds to a potential that an attacker may have in a multi-user environment, when he has no administrative privileges in it. Strength calculation for this model will also be presented, later on, and exactly that kind of numbers should be taken as a basis for giving an assessment for having a minimum acceptable strength. And finally the third version, - when an attacker has a full access to a system. It is obvious, that it is impossible to be protected in such a case, as an attacker can take control of and to replace any sources of entropy, and also can have a direct influence upon internal state of the RNG, or to replace the RNG completely with his own one. This version conforms to a kind of an attacker, that has administrative privileges on a system he attacks, and there is no, nor there can be a defense from him. Therefore, under no circumstances you must generate keys or work with a confidential data on a system, that can be controlled by someone other than you. Otherwise, your security is literally hairbreadth to none.
The sources of entropy, used in the DiskCryptor's RNG, are separated into two types: fast sources, working in the kernel-mode, and slow sources, working in the user-mode. The first ones a residing together with the RNG inside the driver, and the second ones reside in the dcapi.dll, and the data from them is periodically transmitted to the driver. First, let us examine a set of functions of the kernel, used as a kernel-mode sources of randomness. Their main property is inaccessibility of their direct reading by an attacker, having a limited access to a system, but a portion of their bits may be predictable, and so in the table presented below, there are two values for each source: minimum and maximum. The values are true, on condition that the use of these sources is no frequent than 10 times per second, on a more numerous use an effective entropy of each call decreases.
| Function | Dependency | Minimum entropy | Maximum entropy |
|---|---|---|---|
| PsGetCurrentProcess | On the process invoking reseed | 0 | 2 |
| PsGetCurrentProcessId | On the process invoking reseed | 0 | 2 |
| KeGetCurrentThread | On the thread invoking reseed | 1 | 4 |
| PsGetCurrentThreadId | On the thread invoking reseed | 0 | 2 |
| KeGetCurrentProcessorNumber | On the number of current processor | 0 | 2 |
| KeQueryInterruptTime | On all interrupts in the system | 8 | 16 |
| KeQueryPriorityThread | On the thread priority | 0 | 1 |
| KeQueryPerformanceCounter | RTC (Real Time Clock) | 4 | 8 |
| __rdtsc | CPU timestamp counter | 16 | 24 |
| ExGetPreviousMode | Previous mode of the thread execution | 0 | 1 |
| IoGetInitialStack | Initial stack of the thread | 0 | 1 |
| IoGetTopLevelIrp | Last IRP (Input/output Request Packet) in the queue | 0 | 2 |
| MmQuerySystemSize | Size of the system memory | 0 | ? |
| ExUuidCreate | PRNG of the Windows kernel | 8 (?) | ? |
| RtlRandom | Predictable value, but may be altered by other drivers | 0 | ? |
| KeQueryTickCount | Number of ticks of the counter from the moment of system's boot | 2 | 4 |
| IoGetStackLimits | Stack limits for the current thread | 0 | ? |
| Total: | 39 | 75 |
Thus, we can see, that on an average we get 39-75 random bits on one reseed operation. This is not enough if reseed is carried out once, therefore DiskCryptor retrieves entropy from fast functions in the moment of a first thousand disk I/O operations after the system's start. Even if we assume, that the time of I/O operations execution is practically constant, then on the minimum we would have no less than 1 bit of entropy per reseed (because predicting the time of reading data from a disk with the nanosecond precision, would be a fiction), that in total would amount to no less than 1000 bits of effective entropy, right after the system's boot. Subsequently, the fast source of entropy is being used less, and namely in the moment of some not very frequent system events, and in the moment of random number generation.
Besides using fast kernel-mode entropy sources, RNG also uses slow user-mode sources. Several API functions and also user actions (mouse movement, pressing of keys) are used as user-mode sources. At the start of the program, Windows's RNG CryptoAPI CryptGenRandom function is used once, as the source of entropy. Recent studies show, that this RNG is weak and in a protracted period of time it issues pseudorandom numbers, as it rarely replenishes entropy, and thus using it more than once is not justified. To set the value of its effective entropy, is difficult for me, but apparently it has to be no less than 32 bits, otherwise RNG would have had a very short period, that can be found out with simplest test. Mouse movements are used as the source of entropy during all the time, that the program is working. This is a very good source of entropy even on its own accord, as the time between the two mouse communications is wide in comparison with CPU clocks, and even from this time alone, it is possible to confidently gather 16 random bits at a time. Extra 2-3 random bits can be taken off from mouse cursor coordinates and presses of mouse keys. Keyboard keys presses in the program's window are also used by RNG, and both time of the press and release, and the key's code are sent to it. This guarantees, that even in the absence of other sources of randomness, the effective entropy of the keys being generated, cannot be less than the entropy of your password, as the password is also used in key generation. It is also important, that these sources of entropy are inaccessible for reading by a local attacker with limited privileges, since if attacker's privileges are enough to intercept keyboard presses, then encryption loses all its meaning.
In addition to one-time and persistent sources of entropy, described above, the program also uses recurrent entropy sources, information from which is collected every 5 seconds as long as the program works. The list of functions used, and approximate value of amount of random bits receive from them, is presented in the table below.
| Function | Minimum entropy | Maximum entropy | Inaccessible to local attacker |
|---|---|---|---|
| GetDesktopWindow | 0 | 1 | 0 |
| GetForegroundWindow | 2 | 4 | 2 |
| GetShellWindow | 0 | 1 | 0 |
| GetCapture | 0 | ? | ? |
| GetClipboardOwner | 0 | ? | ? |
| GetOpenClipboardWindow | 0 | ? | ? |
| GetCurrentProcessId | 0 | 1 | 0 |
| GetCurrentThreadId | 0 | 1 | 0 |
| GetFocus | 0 | 1 | ? |
| GetActiveWindow | 0 | ? | ? |
| GetKBCodePage | 0 | ? | ? |
| GetCursor | 0 | 2 | ? |
| GetLastActivePopup | 0 | ? | ? |
| GetProcessHeap | 0 | 2 | 0 |
| GetQueueStatus | 0 | 2 | 0 |
| GetInputState | 0 | ? | ? |
| GetMessageTime | 0 | ? | ? |
| GetOEMCP | 0 | ? | ? |
| GetCursorInfo | 8 | 16 | 8 |
| GetCaretPos | 0 | ? | ? |
| GetThreadTimes | 4 | 8 | 0 |
| GetProcessTimes | 4 | 8 | 0 |
| GetProcessMemoryInfo | 4 | 8 | 0 |
| QueryPerformanceCounter | 8 | 16 | 8 |
| GlobalMemoryStatusEx | 4 | 8 | 0 |
| EnumWindows | 128 | 256 | 128 |
| Total: | 162 | 335 | 146 |
In total we have 146 bits, that are guaranteed to be inaccessible even to a local attacker. That is not too much, but nevertheless, already lies beyond the capabilities of attackers having modern day technology.
Aforementioned RNG is designed for the generation of small amount of random numbers, with a large strength reserve. This scheme however, has quite low efficiency, and therefore it is unsuited for generation of gigabytes of random numbers, that are needed for the secure wiping algorithms, for the process of erasing disk sectors on encryption. For this purpose DiskCryptor has the pseudorandom number generator, that is built with the utmost simplicity, but nonetheless its durability is based on the AES strength.
For the source of entropy in this scheme, the main RNG is used, with the help of which a random AES key is generated on a first initialization of the generator. Output of pseudorandom numbers is done by the process of encryption, using this key, of sequentially incrementing 128 bit counter. Statistical characteristics of output data are fully ensured by the AES algorithm. On the completion of generator's work, the key is destroyed, and it becomes impossible to distinguish an obtained data from a random data. This scheme has the property of being simple and sufficiently safe for its purpose.
| Software Generation of Practically Strong Random Numbers |
| NIST Special Publication 800-90 |
| Cryptanalytic Attacks on Pseudorandom Number Generators |