A Conversation for What is a Random Number?

Peer Review: A1065746 - Random Numbers

Post 1

Sten

Entry: Random Numbers - A1065746
Author: Sten - U228865

Hi folks.

This entry is closely adapted from the discussion in Knuth's "The Art of Computer Programming". If anyone thinks it is too close to the book, I can change it. I still think that it is important to get technical (because otherwise you'll get philosophical and the arguments tend to get kind of murky, subjective, and un-verifiable), so if you want me to get less technical, be prepared to argue a lot smiley - smiley

Fun,

Sten


A1065746 - Random Numbers

Post 2

Cyzaki

I was told that numbers are random if, when looking at the first digit of each number, 30% are 1s, 17% are 2s (or something, can't remember the exact figure) and so on down to about 3% are 9s. I think...

smiley - panda


A1065746 - Random Numbers

Post 3

Sten

Ah yes, that would be Zipf's law. Zipf noticed that the first pages of logarithm tables were well-used and worn while the later pages were less used and clean. He eventually conjectured that the probability that the leading digit of a random number is d with probability log10 d (try it with factorials, Fibonacci numbers, prime numbers or numbers from a statistical yearbook, but not a telephone directory).

I don't know whether Zipf's law has ever been proved, but Knuth contains at least nice plausibility argument.

Fun,

Sten


A1065746 - Random Numbers

Post 4

Geggs

One question: exactly how close is your entry to the text in the book?

Because if you've just changed the odd word here or there, then it may break copyright rules. Unfortunately, I don't have a copy of the book to hand so I can't tell myself. But I'll trust you.

There were also a few first person references in there that would have to come out, or be changed, if this were to be accepted into the Guide.


Geggs




A1065746 - Random Numbers

Post 5

Sten

Oops, the probability in Zipf's law should be proportional to log10(1+1/d).

You have to be careful, though. Just because the numbers you have obey Zipf's law, this does not mean that the numbers are random. For example, let's assume that the probabilities for the leading digit are 0.3, 0.18, 0.12, 0.1, 0.08, 0.07, 0.06, 0.05, and 0.04, respectively. Then the sequence consisting of 30 times the number 10, followed by 18 times the number 20, followed by 12 times the number 30 and so on until 4 times the number 90 would be random by your definition, but not by anyone else's, probably smiley - smiley

Fun,

Stephan


A1065746 - Random Numbers

Post 6

Sten

The sequence of definitions and theorems and their contents is exactly as in the book. It's hard to do otherwise and would apply to a lot of mathematics. The text between the definitions and theorems is (IMHO) sufficiently paraphrased. If that is a problem, I would best rewrite the article completely, even though I don't know of a better or more lucid presentation than Knuth's.

I'll go over the entry tomorrow evening and see whether I can rephrase things a bit more, but I believe that if you stick by the general strategy of going with computable subsequence rules, you can hardly present the material in a better way. Knuth is rightly famous for his excellent writing style... smiley - smiley

The first-person references will come out right now. Thanks for the hint.

Fun,

Sten


A1065746 - Random Numbers

Post 7

Old Hairy

Hello Sten.

Whether or not this entry infringes copyright is not for me to say, but large parts of it are taken from Knuth VERBATIM. What's left has problems.

First, in Knuths book, a long chapter about generating pseudo-random number ends by considering the philosophy of that. In other words, can numbers produced in an entirely predictable way by a computer algorithm be considered random. That's the part you largely follow, but without the substance needed to make the question meaningful.

The following comments are made in the order things arise in the entry.

Although very many digits of the decimal expansion of Pi are known, I think no-one has proved they are random (Knuth says that in the book you cite, even in the section from which you heavily quote). You have not mentioned the statistical tests of randomness which are the basis of your .

You say , but I bet £1 every week in my local pub on just such a number, called the bonus ball number (a byproduct of the British lottery).

Even with an infinite sequence of random numbers, I do not need , or even all the decimal digits. I get these random numbers by rolling a dice (with infinite life, it and me). As you are talking about certain numbers, that is random numbers, whose very existence you cast doubt upon in your opening remarks, you really should say that are part of the continuum of numbers on the real number line, or whatever non-technical words you prefer to express the same idea.

You say , but give no clues why this might be true. Once we know what an uncountable infinite sequence is, we can never find a finite sequence which is uncountable.

Not very good form to criticise your major source for not yet publishing Volume 4. After all, his books are regarded as definitive throughout the computer industry, and VERY well researched. Just to add to that insult, you go on , which looks like a flagrant advert for another entry, not yet written (but already there in Knuth).


You say . That is only the case if the only random sequence you wish to consider is a uniform [0,1] sequence of real numbers. (In Knuth's book, that is explained.) But you offer no such explanation, and thus rule out my dice numbers, which are truly random.

I'll not make any more technical points. Just one overall comment. If the intended reader is likely to follow the very formal mathematics (proofs, lemmas and all), they will find your long-winded car analogy for the simple idea of refining a definition very, very boring. You perhaps should write something which is all your own work, to achieve a consistency of style and level.


A1065746 - Random Numbers

Post 8

Sten

Hello ?.

Thanks for the thorough review. I agree that some parts are taken verbatim from Knuth. This concerns the mathematics mostly. I see no point in renaming variables just to make the text a bit different. I also disagree that large parts are merely copied. In my very first introductory comment, I have already offered to rewrite the entire article if that is a problem, however. (Please note that the beginning of the article clearly states the source and that the article will follow the discussion found therein.)

Concerning the other points you make:

I don't agree that you need a theory of random number generation in order to ask what a random sequence might be. You can discuss this question entirely separately, and the section in the book is self-contained, even to the point of repeating the definition of a [0,1) sequence, which seems to support my point of view. As you have already remarked, however, I am planning to write another article on the generation of random numbers (where I have an entirely different outlook than Knuth, because Knuth talks mostly about LCPRNGs, which have a number of problems for the areas in which I use random numbers, namely security and cryptography).

Also, I don't agree that the section in Knuth is about the philosophy of generating random numbers by computer. The section is entitled "What is a random sequence?" and considers the randomness first of arbitrary [0,1) sequences and then of b-ary sequences. There are no assumptions whatsoever how these sequences are generated. Computers enter only in a very abstract way when defining computable subsequence rules. (The beginning of the section talks about the reason why he undertakes to define randomness, but his arguments apply just as well to sequences that were not generated by computer.)

You say that you , but in reality you are betting that you can predict the next number in a *sequence* whose length is not bounded a priori. Still, I have changed the formulation to say that I'm not considering finite sequences. Please reread the article and tell me what you think.

You say <Even with an infinite sequence of random numbers, I do not need , or even all the decimal digits.> That is true, which is why, beginning with Definition R5, the talk shifts from arbitrary [0,1) sequences to b-ary sequences. The very point you criticise seems to have been addressed.

You also assert that I cast doubt upon the existence of random numbers, but that wasn't my intention. (In fact, the very last paragraph asserts that there exist sequences that satisfy Definition R6.) I have altered the first sentence to read "Random numbers are strange things because they must satisfy a number of *seemingly* contradictory demands." That should clear up the confusion.

You write, <You say , but give no clues why this might be true.> You're right. I wanted to express that it is easier to talk about infinite sequences first and later expand (or narrow, in this case) the view to finite sequences. I will add that point.

As a side note, a sequence of throws of real dice will probably be independent, but probably not equidistributed. There might be some bias that comes from the fact that not all surfaces of the die are equal (if they were, we couldn't differentiate between the different outcomes). This bias might not be noticeable in practice, but I'm not sure whether you can rule it out theoretically. If that is so, then you can't call a sequence of dice throws "random" a priori.

A technical point, a sequence (finite or otherwise) where the individual members are indexed by integers is countable by definition.

However, if you meant that knowledge of infinite *random* (not uncountable) sequences precludes us from defining finite random sequences, I disagree. In fact, Section 3.5.E. of Knuth addresses precisely this problem, using ideas of Section 3.5.B. and 3.5.C.

You also assert that I criticise my major source, and on rereading the text, I can see how I could be misunderstood. Nothing could be further from my intentions. Please reread the passage and tell me what you think now.

You say that I don't explain that the random sequence that I'm looking at is a [0,1) sequence. I thought I had made that clear when I named it "our" sequence. I have cleared that up now. Please take a look and tell me what you think.

You write, Yep, that's true. I noticed that too, and have removed the section completely. Perhaps it would be a good idea to give an informal overview of the main results at the beginning of the article? Let me know what you think. (BTW, there is exactly one lemma, with a very simple proof. I don't think that is overly technical, but I'll let the editors and other readers be the judge of that.)

You write, <Just to add to that insult, you go on , which looks like a flagrant advert for another entry, not yet written (but already there in Knuth).> First of all, I think I have made it clear by now that no insult was intended. And yes, I am planning to address random number generation by computers in subsequent articles. These articles, however, will for the most part not follow Knuth at all. Instead, I will (also) talk about entropy gathering, mixing functions, cryptographic random number generation, security problems that happen when it's not done right, etc. The problem with LCPRNGs is that the "numbers stay mainly in the planes" and are very predictable (even though this predictability is not discovered by statistical tests).

Please receive my honest thanks for having taken the time to criticise the entry so thoroughly. I'd sincerely like you to go over the changed article again and tell me what you think.

Fun,

Sten


A1065746 - Random Numbers

Post 9

Z

smiley - footprints I'm just dropping in to lurk. I have read the entry and enjoyed it. Though I'm not at all familar with the subject.


A1065746 - Random Numbers

Post 10

Old Hairy

Hello Sten.

Still looks to me as if the whole thing is about PSEUDO-random numbers, as generated by computer, and not actually random numbers.

The bonus ball number is a single random number. I and 48 others each pay £1 into a pot, each backing a certain number. The lottery randomly picks one of these, the lucky owner of that number pockets £49. The lottery machines produce this number at random in most peoples eyes - they do not look to sequences to justify this. To the extent that the lottery machines are checked, they are not checked by your [0,1] sequences.

More later, urgent business intervenes.


A1065746 - Random Numbers

Post 11

Sten

Hello ?.

I still don't agree that Section 3.5 is entirely about computer-generated sequences of pseudo-random numbers. I fail to see any dependence on computer-generatedness in the theorems and definitions. But to support my point of view further, take a look at Section 3.5.F, which is explicitly about "Pseudorandom Numbers", at the very end of Section 3.5. It seems to me that the special questions of the randomness of computer-generated sequences is treated only there. If that is true, then the rest of the section is not explicitly about computer-generated pseudorandom sequences.

You write, , but the point of this entry is precisely *not* to give a common-sense definition of randomness, but to make a quantitative statement.

Oh, but I think they are. Ask yourself: You examine a lottery machine. How would you convince yourself that no number is favored over any other number? You will probably examine the machine and conclude that if you make a large number n of draws, then about n/49 of them will be 1's, n/49 of them will be 2's and so on. This *reasoning* is independent of the *actual* number of draws made.

I think this "limiting argument" is built into the concept of probability. It's certainly built into Knuth's definition of probability (Definition 3.5.A.), but I concur that there are other definitions.

(If you can make a convincing argument without using sequences of draws, I'd be interested to hear about it. That would probably entail a definition of probability that differs from the usual.)

Fun,

Sten


A1065746 - Random Numbers

Post 12

Old Hairy

Hello Sten.

Returned now from urgent business (I never make that as an excuse unless I mean it).

But you have replied in the meanwhile.

I think what I am trying to say is that, there are certain numbers which are accepted as random. My dice would have to be wrapped up in qualifications like theoretical perfect and so forth. But there are really random numbers, not testable by any of R1 to R6.

The ERNIE machine is very definitely based on such numbers, is exhaustively tested (so far as is possible) for its randomness, so that it is known not to malfunction. You and I cannot make these tests, because the published numbers that ERNIE produces have been filtered. They are random, as modified by the subscription list of numbers.

I suspect that your interest in random numbers lies in the area of cryptography. That is fine, but then you do not want truely random numbers, you want numbers so that the next in sequence is not easily deduced. You want to have numbers that are produced by an agreed method, the pseudo-randomness that I have mentioned.

I have made my comments, its your entry, and we can agree to disgree. Even on all the technical points.

I still think, very strongly, that you should just quote your source. Shower him with praise if you wish (I really rate D.E.K too), but not mention the delay in volume 4, the advertised contents of which are irrelevant to your entry.

I thought (but might be wrong) that guide entries should not advertise forthcoming events. Yours does.

Very glad to see the back of the car.

Now I'll leave to the others (there are several on this thread).



A1065746 - Random Numbers

Post 13

Sten

Hello ?.

The publication date of volume four has been a standard joke in my circles for a long time (as in: "You'll remove the final bug from your program when hell freezes over --- or when Volume Four(tm) appears!"), which I thought innocent enough. But maybe you're right. I'll remove the reference to the publication date.

Thanks for your contributions, they have certainly helped to make this entry better.

All the best,

Sten


A1065746 - Random Numbers

Post 14

Sten

Hi ?.

(Judging from your previous remarks, you are probably not reading this. I'm posting this anyway, just in case.)

Your remarks about singular random events had me thinking over the weekend. And indeed, there is a book on probability theory by an Italian named DeFinetti (which was on my bookshelf, no less) that advocates probability as purely subjective, without objective meaning. Probability (and randomness) would therefore be in the eye of the beholder, as long as one is consistent (for example, if you believe that P(A) = 0.1 and P(B) = 0.2, and if you believe that A and B are independent, then you must conclude that P(A and B) = 0.02, otherwise you are being inconsistent).

I will update the article and say some things about this theory. That's not the theory that I will use, but I'll acknowledge its existence.

All the best,

Sten


A1065746 - Random Numbers

Post 15

Old Hairy

Hello Sten.

... still lurking ...


A1065746 - Random Numbers

Post 16

Sten

Sorry, I'm having a busy week. I'll do my best to upgrade the entry soon, but I really can't say when "soon" will be. If you are monitoring this conversation, I'll leave a note here when I'm done.

Fun,

Sten


A1065746 - Random Numbers

Post 17

Sten

Hi Old Hairy (I assume you were TRFKA? (The Researcher Formerly Known As ?)).

Please take a look at the new (and, I think, greatly improved) entry. I have reordered the sections, added a paragraph or two about DeFinettis subjective theory of probability, explained some things about infinity-distribution and generally had a good time.

I don't much like my explanation of the distinction between certainty and probability one, but it's the best I could come up with. Do you have a better idea?

All the best,

Sten


A1065746 - Random Numbers

Post 18

Old Hairy

Hello Sten.

Yes, I did have a pseudonym for a while. And I did step back to let the others have their say, but as noone has, and you specifically asked, here we go.

The new section "Executive Summary" is most helpful, but probably should come after "Introduction".

The copyright concern remains. As both DEK and Addison-Wesley are available on-line, why not seek permission to use their copyrighted material? There still is a marked difference in styles between Sten and DEK, and it shows in this entry.

I still find the question you pose in the first paragraph of "Introduction" unanswered. I tried to help on it with allusions to lotteries, dice etc. The point about the bonus ball is that we regard it as random because, whichever number is picked, it has the same chance as all the other numbers which are not. And the bets have nothing to do with sequences, since we regard next weeks number as being totally independent of last weeks, just like 2-distributed.

I cannot read most of the special symbols you use, and have referred to Knuth to discover what they're supposed to be. I would use something other than nu in the formulas, rather than point out that its hard to distinguish on screen. It's not as if you've run out of letters. But then, you are following Knuth rather closely.

The reference to an alternative definition of probability adds confusion to an already opaque topic. I have not read DeFinetti, but would hazard a guess that "singular" event ought to be "single" event, or do you mean "exceptional"?

"If we sample a truly random number generator, we should be able to make the assumption that every sequence of real numbers between 0 and 1 is equally likely to occur. Unfortunately, some of the sequences are not even equidistributed" is nonsense. By infinity distributed, any sequence is n-distributed for all finite n. If you put the word "finite" before sample, it all makes sense. Knuth makes it clear by saying random sequences may, in parts, appear to be non-random. And your "random number generator" is what exactly - an algorithm for pseudo-random numbers, as I have argued before, or something more like ERNIE - in which case one can only ever hope to look at a finite sample of the sequence.

You say "Unfortunately, computable subsequence rules cannot cope well with arbitrary integers. For example, if a real number x is given by its radix-10 expansion, there is no effective algorithm that will decide whether x < 1/3!" Your example isn't an integer! Knuth has "... effective algorithms cannot deal nicely with arbitrary real numbers ..." before the same example. He is ever so precise, despite the apparently informal language.

When you arrive at R6, you do not say that it might not be the ultimate definition (but Knuth does hint at that, in a footnote.), and Knuth credits R6 to A. N. Kolmogorov.

There are several typos, but "is nto a typo" brought a smile. (I'll leave the rest until its nearly finished).

I do wish the others on this thread would add something.

OH


A1065746 - Random Numbers

Post 19

Sten

Hi Old Hairy,

> The new section "Executive Summary" is most helpful, but probably
> should come after "Introduction".

Good idea, done.

> The copyright concern remains.

I don't think that copyright is a concern. I'll leave that for the editors to decide.

> There still is a marked difference in styles between Sten and DEK,
> and it shows in this entry.

Of course there is! If there weren't, I'd be a famous author of CS books! smiley - smiley

> I still find the question you pose in the first paragraph of
> "Introduction" unanswered.

Well, then I'm at the end of my wits. I tried to explain as clearly as I could why the very notion of "one ball is as likely to be picked as any others" either involves DeFinetti-like subjectivity (in which case I could assign any probability to any ball, as long as I'm consistent) or thinking about sequences. Possibly my explanation just isn't for you, then.

> I cannot read most of the special symbols you use, and have
> referred to Knuth to discover what they're supposed to be.

Sorry, they came out fine in my browser. I have replaced some symbols with ordinary letters. Tell me if you have trouble with any others.

> But then, you are following Knuth rather closely.

And as I have already pointed out, I see no sense in changing mathematical symbols just for the sake of being different. Readability is a valid issue, though.

> The reference to an alternative definition of probability adds
> confusion to an already opaque topic.

This section was added specifically to address your concerns! We have agreed to disagree in our notions of probability and I wanted at least to acknowledge that my definition isn't the only possible one.

> I have not read DeFinetti, but would hazard a guess that "singular"
> event ought to be "single" event, or do you mean "exceptional"?

First of all, the use of "singular" is due to me, not Finetti. I mean neither a single nor an exceptional event. I mean an event that you are not reasoning about as happening more than once. For example, it might be the creation of the universe, or it might be the drawing of the bonus ball in the British Lottery in your version of it (I'd still see it as a sequence).

> "If we sample a truly random number generator, we should be able
> to make the assumption that every sequence of real numbers between
> 0 and 1 is equally likely to occur. Unfortunately, some of the
> sequences are not even equidistributed" is nonsense.

No, it is not. If it were, a truly random number generator could output some sequences of numbers with higher probability than others. That is not a property of what I would consider a truly random number generator. And I do mean infinite sequences of real numbers here. However, I have tried to make that statement more readable.

As an aside, you should perhaps be a bit more careful with words like "nonsense".

> Knuth makes it clear by saying random sequences may, in parts,
> appear to be non-random.

I can't find that in this section. Could you give me a pointer, please?

> And your "random number generator" is what exactly - an algorithm
> for pseudo-random numbers, as I have argued before, or something
> more like ERNIE - in which case one can only ever hope to look at
> a finite sample of the sequence.

Neither. It's a hypothetical construction for the purposes of finding out its properties. A Gedankenexperiment, as it were. This has also been added to the relevant section.

> [...] Your example isn't an integer! Knuth has "[...]" before
> the same example. He is ever so precise, despite the apparently
> informal language.

You're right on both counts. In my defense, I might say that Knuth has had the benefit of more than twenty years of peer review, and I've had only a week or so smiley - smiley (And no, I'm not comparing myself with Knuth here. All I'm saying that yes, I do make mistakes occasionally. But your style of pointing them out sometimes makes it hard to thank you for it, even though the entry is undoubtedly getting better because of you.)

> When you arrive at R6, you do not say that it might not be the
> ultimate definition (but Knuth does hint at that, in a footnote.),

I added something to that effect.

> and Knuth credits R6 to A. N. Kolmogorov.

I can't find that. Pointer please?

> There are several typos, but "is nto a typo" brought a smile.

LOL. Unfortunately, I can't search in the editing window, so I have no clue where that is.

> I do wish the others on this thread would add something.

That'd certainly be nice.

Fun,

Sten


A1065746 - Random Numbers

Post 20

Sten

I wrote,

> It's a hypothetical construction for the purposes of finding
> out its properties

That should really be "It's a hypothetical costruction for the purpose of finding out the properties of ideal random number generators."

Sten


Key: Complain about this post

Write an Entry

"The Hitchhiker's Guide to the Galaxy is a wholly remarkable book. It has been compiled and recompiled many times and under many different editorships. It contains contributions from countless numbers of travellers and researchers."

Write an entry
Read more