Page 1 of 1

### [Tutorial] Randomization in Calc

Posted: Thu Jan 10, 2013 12:53 am
This tutorial explains several different techniques for randomization in Calc:
1. Creating uniformly-distributed random numbers between specified minimum and maximum values
2. Creating uniformly-distributed random integers between specified minimum and maximum values
3. Creating uniformly-distributed unique random integers between specified minimum and maximum values
4. Creating normally-distributed random numbers with specified mean and standard deviation
5. Arranging a set of values in random order
6. Selecting a random sample of size N (subset) from a set of values
7. Lot sampling
All of these are demonstrated in the attachment. The techniques are related, and sometimes a technique used for one situation can be applied in a different situation. The attachment uses defined names in some formulas, which can be displayed with Insert → Names → Define.

A. Creating uniformly-distributed random numbers between specified minimum and maximum values
The =RAND() formula provides uniformly-distributed random numbers between 0 and 1. Let's call the difference between the minimum and maximum the Range. Then the desired values are the product of RAND() and the Range, increased by the minimum value. All RAND() functions are recalculated when you press F9, when the spreadsheet is initially opened, and (if Tools → Cell Contents → AutoCalculate is checked) when any cell is changed. To preserve the random values so that they are not changed further, select them, Edit → Copy, then Edit → Paste Special → Uncheck Paste All → Uncheck formulas → Check Numbers → OK.

When using Calc's RAND() function you need to be aware of limitations in that generator. Most implementations of Calc use a 32-bit generator, and the random numbers begin to repeat after about 4.3×10⁹ values. This sounds like a lot, but suppose you want to randomly shuffle a deck of cards. There are 52 factorial possible permutations and 52! is about 8.1×10⁶⁷. Thus the random number generator can only select a tiny fraction of the possible arrangements of the deck. If this will be a problem for you, consider using the Mersenne Twister Random Number Generator Add-In, which is a 19937-bit generator and doesn't repeat until more than 10⁶⁰⁰⁰ values have been returned.

B. Creating uniformly-distributed random integers between specified minimum and maximum values
The =RANDBETWEEN(min;max) formula provides uniformly-distributed random integers from min to max, inclusive. The values are independent, so it is possible, if you use RANDBETWEEN() more than once, to get duplicate integers. Using RANDBETWEEN() is equivalent to drawing numbered balls from a hat, replacing the drawn ball before drawing the next one. RANDBETWEEN() is recalculated if you use Recalculate All (⌘⇑F9 on a Mac or Ctrl+Shift+F9 on other platforms). You could also use the technique in A and truncate the values to integers with the INT() function, however with that technique you will get integers that may include the minimum value but will never include the maximum value. There is also a Mersenne Twister random integer generator.

C. Creating N uniformly-distributed unique random integers between specified minimum and maximum values
If you need multiple integers and they must be distinct, technique B is not suitable. However, you can use Floyd's Algorithm (described below, example in the attachment). This is similar to drawing numbered balls from a hat, discarding the drawn ball before drawing the next one. Warning: the integers that are generated by the algorithm are neither in ascending order, nor descending order, nor in random order since larger values tend to occur later in the list. If you need the generated values to be in ascending or descending order, you can sort them. If you need them to be in random order, use Technique C and then use the technique described in E. Technique C can be used even when the Range (difference between minimum and maximum) is larger then the number of rows allowed in a sheet; for example, it can select 12 random unique integers from 76 to a billion. We can't really do that by drawing balls from a hat, unless we have a very large hat and 999999925 balls.

You must not use Technique C when N is equal to the size of the range of the range. Since every number in the range is chosen, the selection is not random, and due to the algorithm, the order isn't random either. However, there are some alternate techniques to create unique random integers which can be used when the Range does not exceed the number of rows in a sheet:
If you want the values in ascending order, put the integers from the minimum to the maximum in cells, then use the technique in F.
If you want the values in random order, put the integers from the minimum to the maximum in cells, sort them in random order (technique E), and select the first N of them.
Hint: To put sequential integers in cells, select these cells then use Edit → Fill → Series → Increment → 1.
Hint: To put random integers into a grid (common for soccer/football pools), first put the integers in a column, use technique E to get them in random order, and then use formulas in the grid to copy them from the column, as shown in the attachment. To generate a new random grid arrangement, select cell 'Soccer Pool'.M1 and use Data > Sort. Another idea for a sports pool is in [Solved] Football Pool.

D. Creating normally-distributed random numbers with specified mean and standard deviation
The =NORMINV() formula will create the desired values. To create normally-distributed random numbers with specified mean and variance, just calculate the standard deviation as the square root of the variance and then use NORMINV().

E. Arranging a set of values in random order
Specifically, let's say the data to arrange randomly is in column A, there's other related data in column B, and column C is unused. A simple technique is put the formula =RAND() in the first cell of column C and fill the formula down, for example by double clicking the fill handle. Then Data → Sort can be used arrange these cells according to the random values in C. After the sort is done, Calc will re-evaluate the =RAND() formulas and put new random values in C, so those values won't stay sorted, but the rows were arranged randomly, as desired. To create another random arrangement, just use Data → Sort again. If column C is no longer needed you can delete it.

F. Selecting a random sample of size N (subset) from a set
An efficient technique is to select values from top to bottom with a probability directly proportional the number of values still needed and inversely proportional to the number of values remaining, as shown in the attachment. One could also use technique E to put the values in random order and then select the first N of them, but this requires two steps and, because a sort is required, it is less efficient than using the formulas in the attachment.

G. Lot sampling
Suppose we want to select N samples from L lots. If N<L, the preceeding paragraph is the optimal way to do this, but it may be simpler to select samples at one rate initially, then switch to a different rate for the remaining lots. This isn't as random since the sampling rate isn't uniform and because not all subsets of size N are equally likely, but it may be sufficient for some sitiations. The Lot Sampling sheet shows how to select the sampling rates, given the number of lots and the desired number of samples. When N<L, samples/lot is less than 1, so we select 1 sample from each group of lots. When N≥L, samples/lot is at least 1, so we select at least one sample per lot.

Note that questions are not allowed in the Tutorials section of the forum. Ask them in the Applications > Calc section.

Randomization in Calc.ods