Pick any large set of random data you like. Look at the first digit of all the numbers. You're going to see a lot of ones. It's not just a coincidence. It's the law.

Benford's Law was discovered by Simon Newcomb, who was thumbing through a book of logarithmic tables. He noticed that the pages which contained tables of the digits beginning with low numbers were far more worn than the ones beginning with hi numbers. In modern times, that would just mean that people started with the low numbers and had wiped the pizza grease off their hands by the time they got to the higher ones. Since it was 1881, though, Newcomb figured that the low numbers were used far more often than the high numbers. Since the book was in a library, it had presumably been used by a random assortment of people for a random assortment of problems. Newcomb found other books, in other libraries were worn in a similar way.

Clearly, people needed data on numbers beginning with low numbers more than they did high numbers. That didn't make sense. If the world is random, the beginning digit of the numbers looked up to describe the world should be the same way. Digits one through nine should each be used 11 percent of the time, and the book's pages should all be equally worn.

They aren't, and they weren't.

The digit ‘one' is likely to begin any number based on a survey of real thing thirty percent of the time. Frank Benford was the gentleman who actually got his name put lastingly on the law, rediscovered this trend, and he broadened the amount of evidence to support it. Among the data he gathered was newspaper circulation, river area, death rates and the addresses of people listed in the book "American Men of Science." One appeared roughly 30 percent of the time. More than that, low digits were far more likely to appear than high digits in the leftmost place of any number.

Benford's law isn't always applicable. The number sets have to be big enough. There also can't be any specific outside influence, such as ‘human thought.' Prices, for example, are designed to appeal to customers, and therefore aren't subject to Benford's law. If you were to analyze purchases of, say, packets of potato chips, you would find that certain numbers would come up again and again. People see different prices on the shelf and make decisions based on those prices. However, if you were to follow many people around during their week long vacations, and add up all their purchases and pay-outs, you would find a Benford distribution – so long as they didn't have a specific budget in mind. A single purchase requires thought. A number of different purchases should add up to a ‘random number,' which is not random at all.

But when data sets get big, the biggest giveaway that you're not being presented with random numbers is being presented with completely random numbers. Benford's law was used to detect fraud in Iran's elections. It's also widely used to detect financial fraud. Expense claims, financial pay-outs, tax fraud, and accounts payable all conform to Benford's law when they're legit. Checking these numbers against the law is so common that software is sold to do it.

Via DPS guide, Mathworld, Physics World and UIC.

## DISCUSSION

What's the name of the law that says that with a large enough sample size for any given set of data, it's statistically unlikely that there

wouldn'tbe weird coincidences?