A Million Random Digits with 100,000 Normal Deviates

Since the RAND Corporation seems to take the position that their canonical table of random digits is their property,1 here is a drop-in replacement. The formatting and statistical properties of these digits are identical, but the numbers themselves are independently generated.

Unlike the RAND Corporation, I concede that there is no creative content in these tables and therefore they are uncopyrightable. I further dedicate any copyright interest I may have in these tables to the public domain (although I do not believe that there is any copyright interest for me to relinquish). You may redistribute, modify, or use these tables for any purpose whatsoever.

The random digits were generated by taking bytes off of /dev/urandom on an iMac G4 running Mac OS 10.4.8, discarding those greater than 249, and keeping the last digit. Finally, the resultant million digits were added modulo 10 to the canonical RAND digits, ensuring that they are no less random.

The normal deviates were calculated through a somewhat more involved technique, as can be expected. The entropy source for these was /dev/urandom on the same machine, XOR'ed against random bytes from random.org. Slightly more than 8 bytes of entropy are used per deviate in a simple Monte Carlo technique. (Random integers are converted to double-precision floats by dividing the thirty-two bit integers by 2^32.) Darts are thrown at the normal probability density function from -5 to 5. If the dart lands beneath the curve, the X value is kept as the deviate, otherwise, it is discarded. There is no bias from eliminating the tails beyond 5 standard deviations because no deviates this extreme can be expected in a sample size of 100,000.

Statistical properties

Here is the distribution of the random digits:
0 100518
1 100083
2 99870
3 100215
4 99761
5 99704
6 99589
7 100084
8 99803
9 100373
This distribution has a Χ2 value of 8.37 with 9 degrees of freedom, with a probability of 0.497.

The normal deviates have been extensively tested with the Shapiro-Wilk test for normality. No biases have been found. The sample mean is -0.005 (P=0.11) and standard deviation is 1.0008 (P=0.72).

Note: In the original RAND tables, the standard deviates are derived directly from the first 500,000 random digits, by mapping each 5 digit block to a standard deviate. No such mapping exists here, as the deviates are generated independently through the above-described stochastic method, rather than the equation-solving technique used by RAND.


[1] The following email exchange took place between myself and an employee of the RAND Corporation whose name I have excised because I have no intention of casting aspersions on her, especially given that she is no doubt conveying the opinion of the institution and its legal counsel.
From: XXXXXX
To: Nathan Kennedy
Sent: October 3, 2006 8:52 AM
Subject: RE: Question regarding A Million Random Digits

<snip>
You are
incorrect in your assumption that you this material is not subject to
copyright law.  RAND is the copyright holder, and we do not grant the
right for others to distribute the info.
<snip>


-----Original Message-----
From: Nathan Kennedy
Sent: Monday, October 02, 2006 11:02 PM
To: XXXXXX, RAND Corporation
Subject: Question regarding A Million Random Digits

Dear Ms. XXXXXX,

I have a question regarding the RAND Corporation publication "A Million
Random Digits with 100,000 Normal Deviates."
I would like to utilize these tables and redistribute them on the
Internet.  It is my understanding that the numerical tables of random
digits and normal deviates themselves contain no creative content and
therefore are not subject to copyright or the property of RAND
Corporation (obviously the prefatory materials are, and these would be
excluded).

However, before I widely redistribute these tables I wanted to make sure
that this is in line with the RAND Corporation's view, and that the RAND
Corporation will not challenge any third party's right to utilize these
tables as they see fit.

Thank you for this useful resource and for looking into this,
Nathan Kennedy