Let’s say I rolled a die out of your view and then said the results were one of those two sequences:
(A) 11111111111111111111
(B) 66234441536125563152
Which sequence is more likely to be the one I actually rolled?
Details and Assumptions :
Order does matter in a sequence . These sequences are supposed to be in chronological order, so swapping of digits is not allowed.
The person who is rolling the dice does not lie about the outcomes.
Each of roll dice is independent to each other.
This section requires Javascript.
You are seeing this because something didn't load right. We suggest you, (a) try
refreshing the page, (b) enabling javascript if it is disabled on your browser and,
finally, (c)
loading the
non-javascript version of this page
. We're sorry about the hassle.
It is far more likely to roll a sequence of mixed numbers than to roll a streak of all the same numbers.
Because we don't know the outcome of the 20 rolls that have already transpired, we need to compare the probability of yielding any streak-sequence with the probability of yielding any mixed-sequence, not just comparing sequences (A) and (B) with one another only (like in the first example, which is phrased in future-tense)
Because sequence (B) belongs to the set of all mixed-sequences, which has many more elements than the set of all streak-sequences, then sequence (B) is much more likely to be the actual sequence rolled.
Problem sourced from: http://parade.com/45746/marilynvossavant/sundays-column-10-23-11/
"To convince doubting readers, I have, in fact, rolled a die 20 times and noted the result, digit by digit. It was either: (a) 11111111111111111111; or (b) 63335643331622221214. Do you still believe that the two series are equally likely to be what I rolled? Probably not! One of them is handwritten on a slip of paper in front of me, and I’m sure readers know that (b) was the result." - Marilyn vos Savant
This problem is very poorly phrased.After reading the solution, I get what you are going for. However, none of that is not conveyed in the problem. Here are the issues that I had with it:
Log in to reply
Understood, however,
I agree that the two situations might cause some confusion, even though the distinctions between the two were explicitly described.
In the sourced link, the letter writer describes both situations, but only challenges the conclusions of one of them, which is the same one my problem is asking to solve. In response, the columnist only talks about that situation, which is, once again, the same one my problem is about. Knowing that, can you show me exactly where the confusion and disconnect is?
"The problem solver can make the assumption about how likely you are to choose a "highly ordered" sequence vs a "random sequence", without any further information about "I".' I'm not quite sure what you meant by all that. Please clarify.
Are you referring to the solution's stated assertion (not assumption), that mixed-sequences are more common than streak-sequences? Just to confirm, there's no concern about what exactly counts as a "mixed" sequence, correct?
"You are making the assumption that 2 is valid. I disagree"
Once again, I'm not sure what "2" is referring to. Please clarify
Log in to reply
The main issue is "What is the prior distribution of the other sequence that is written down by me"? The best way to understand this, is to refer to Bayes' Theorem .
Let a , b be the event that sequence A , B is rolled. We (believe, maybe wrongly) that P ( a ) = P ( b ) .
P ( a ∣ I list out sequences A , B ) = P ( a ) × P ( I list out sequence B ) + P ( b ) × P ( I list out sequence A ) P ( a ) × P ( I list out sequence B )
The mathematican's view is that "Without other knowledge, we assume that each event is equally likely, or that P ( I list out sequence A ) = P ( I list out sequence B ) , which is why P ( a ∣ I list out sequences A , B ) = 2 1 .
Marilyn's claim is that " P ( I list out sequence A ) > > P ( I list out sequence B ) ", because sequence B is so "random". This is why P ( a ∣ I list out sequences A , B ) < 2 1 . I don't see why the problem solver is allowed / expected to make such a claim, which is my 2nd point.
With regards to my 3rd point, I am arguing that Marilyn's claim is not valid. My belief is that if we pose such a scenario to 6 2 0 people, the sequence of all 1's would never arise because humans hate such complete order. As such, it would be more likely that 1111....111 have arisen from a dice throw instead. Thus, this problem requires a lot more information about "I", than is given.
Log in to reply
@Calvin Lin – Ah, ok, I see.
Unfortunately, I am not familiar with that technical model and it's terminology, so I can't really respond to it. But thank you for providing it, I will research it.
It seems like you're saying that it invalidates her argument because of the "without other knowledge..." part, and that her claim "because sequence B is so 'random'" does not legitimately qualify for "other knowledge" that would off-set the initial default 1/2 probabilities of each (offset or change default assumption that each event is equally likely)... is that at least partially correct?
If so, are you just not convinced that the problem solver can justifiably make that claim though it may be possible to do so (Marilyn's claim), or are you proactively stating that you know that the problem solver indeed can NOT legitimately make that claim...?
Log in to reply
@Taylor Shobe – I think I understand what you meant on that last post, but let me ask you this.
Let's say we were to conduct a statistical experiment in which we execute a zillion trials, where each trial involves casting a single die 20 times, recording the resulting sequence, then creating a fake sequence to pair with the true one which will deliberately be either a mixed or streak sequence depending on which type was NOT rolled, and then finally tallying all the results to see how many were actually streaks vs mixed in each trial where both types were offered...
Would you agree that the results SHOULD very much be in favor of the mixed type than compared to the streak type?
Log in to reply
@Taylor Shobe – In that case, yes, but it's because you chose two divide the set of sequences into two groups (mixed and streak). This is an arbitrary choice not mentioned in the question.
Log in to reply
@Andrés Castillo – It doesn't need to be mentioned or even chosen, it happens regardless in both examples.
If we define "mixed" sequence as any sequence that is not a streak, then those two types necessarily occur.
If we restrict the definition of "mixed" to be more exclusive, like, requiring a minimum quantity of different numbers, then whatever was formally considered a mixed sequence (and now isn't because we re-defined it) still is not a streak-type sequence , and this type ("almost streak type"?) would still be more common and numerous than pure streak types.
Either way, there are only two possible sequences types; streak types, and non-streak types.
So, the original question asks: "Which sequence is more likely to be the one I actually rolled?"
Knowing that both of the two offered sequences are specific but also belong to a larger family group or set of similar sequences, which one do you think will have the over-riding probability. A single element has its own probability of occurring, while a single set or collection containing that element also has its own probability of occurring, and both apply to the question, so which one determines the answer? Just because these sets or types are not mentioned in the original problem does not mean they do not exist in the original problem or can just be ignored.
Log in to reply
@Taylor Shobe – As you said: "Just because these sets or types are not mentioned in the original problem does not mean they do not exist in the original problem or can just be ignored.", but the fact that they aren't mentioned in the original problem does mean that there is no particular mathematical reason to choose to divide the set of all sequences of 20 integers between 1 and 6 in these 2 particular groups. There's many ways two divide it into 2 groups. As I mentioned in another reply, one such way would be:
Sequence (a) belongs to the set of NONSIXERS which is clearly larger (5 times larger) than the set of SIXERS, making (a), as per your argument, 5 times more likely to be the actual sequence rolled.
The only difference here is I chose to give special significance to the set of SIXERS, while you chose to give special significance to the set of STREAKS.
In fact, of the many ways you can choose to split the set of sequences into two groups, half will put sequence (a) in the larger group and half will put it in the smaller, making it, seemingly random (making both sequences equally likely).
The difficulty in this problem lies in that it's not purely mathematical, one of the options is a sequence chosen by the person asking the question. One might argue that the choice of this sequence is not likely to be random, but based on some property of the number (such as being a STREAK), but that all depends on the person.
See also my other reply further down in this thread and on the reports section.
Log in to reply
@Andrés Castillo – Very good point, I'm glad you brought that up (SIXERS vs NON-SIXERS).
Yes, there are several, seemingly arbitrary sets or groups that we can come up with, like SIXERS and NON-SIXERS. However, there's at least one fundamental difference between all of those and the one(s) I specified (streak vs non-streak): NONE of those other categorical sets describe the nature of a sequence as a whole, they only discriminate against certain sequences who either contain or don't contain the number six or the number 2 or a sequence that starts or ends in 62 or whatever you prefer. Your example is changing the scope from general to more specific. That's why I would not consider the specific set you cited as a TYPE of sequence; it's a partial, specific sequence with some missing digits. The Streak vs Non-Streak sets are not only sets, but they are also TYPES which describe the overall nature of the sequence as a whole. If we were to only look at all sequences that include or exclude particular numbers , like the number 6 at the beginning, that would tell us nothing about the rest of the numbers in the sequence, right?
Now, obviously we could come up with some other example of sets that are also "types", where the type describes the nature of the sequence as a whole. For example, we could start talking about sequences where all the numbers follow a certain order (ascending or descending) or certain mathematical operation, resetting, say, every 7th digit or whatever we deem. But would any of these other set "types" ever be in favor of increasing the odds of seeing a streak compared to non-streak??
Bottom line: Yes, all example sets similar to the one you described do seem very arbitrary. The streak vs non-streak sets seem much less arbitrary, if even at all. Furthermore, if ever given the chance to use a general set type which describes the entire nature(s) / parameters of any sequence as a whole, then that type should supersede the usage of sets like the SIXER vs NON-SIXER.
Log in to reply
@Taylor Shobe – Now, I just realized recently that there might indeed be a flaw in the problem's premises which could actually make Marilyn's answer incorrect or indeterminable due to lack of information. I believe it might have been Andres' original objection on the report page. The flaw being that it was never specified which type or sub-types of sequences the roller will be offering as the FALSE alternative sequence (the fake one). I'm currently looking into it and will post my findings when finished.
Log in to reply
@Taylor Shobe – Taylor, let's suppose for the moment von Savant is correct. B is more likely than A to happen. Okay, if this is based on mathematics, then there should be a way to express this numerically, i.e., the odds should be computable. What does it compute to? One scenario is to compare "any random string" with "not random strings", but where do we draw the line as to what is random and not-so-random? I gave one ambiguous example of 11141118111161111321. What is the probability of that happening?
This is not to suggest that it's mathematically meaningless to address such a matter. There are ways of distinguishing between the likelihood of "lesser random strings" from "more random strings", but it's a deep subject, and unfortunately does not yield unique answers.
I sympathize with the troubles you're having with this problem. An argument can be made either way, that is, B is much more likely than A, and B is equally as likely as A. If someone claimed that he did shuffle the deck of cards before it was played, and I decided to check it first, and found all the cards in order, I would say he did not shuffle the cards. In a practical context, if you say that A and B are possible reported outcomes of throwing the die 20 times, then, yes, B is the one far more to be trusted to reflect the actual outcome. That is your argument. Nevertheless, both A and B are equally likely to happen, if we are talking about those specific sequences of integers.
If we go with the latter argument, the mathematics of probabilities is systematic, and we can work out probabilities. Now, let me explain to you what could go wrong if we went with the former argument. What if I proposed the following alternatives?
A) 11111111111111111111
B) 11141118111161111321
Now, notice that after a string of 3 1's, there's a increasing power of 2. Or? How does one compute the probability of B happening, relative to A? How about this?
A) 11111111111111111111
B) 02121809041518651860
B looks kind of random, doesn't it? Well, but 02121809 is the date of Abraham Lincoln's birth, 04151865 is the date of his death, and 1860 is the year when he was elected. So, how does one compute the probability of that relative to A? According to your argument, B is far more likely, but I think the sequence here involving important dates in Lincoln's life would be far more extraordinary. In fact, if I picked up a "supposedly random" radio signal from outer space, I might be baffled by A, but, trust me, I would be truly shocked by B. Beyond belief.
So, what's missing here is a systemic framework in which to work out these kinds of probabilities, using your (and von Savant's) argument. It's not that I don't agree with the general principle here, it's just that I can't see how it can be effectively put on a mathematical foundation.
I don't understand this problem at all. While you are more likely to get a more "random" looking sequence than you are to get 2 0 of the same thing in a row, the probability that you get a specific sequence of 2 0 (no matter if it "looks" random or not) is the same for all sequences.
It seems like you are comparing (A) Not getting 2 0 digits in a row to (B) Getting 2 0 digits in a row.
Log in to reply
Trevor, if I understand what you are asking:
The first part of what you said is true, but in this context does not apply because the question is essentially asking "What are the chances that xyz occurred", as opposed to "What are the chances that xyz will or can occur" So, the fundamental distinction is within the time-tense of the phrasing. (future vs past).
Because it's framed in past tense, we are not restricted to just finding the probabilities of those two specific sequences. Instead, we need to find the probabilities of specific TYPES of sequences, where each TYPE can consist of several specific sequences.
We really do not need to spend much time exploring what exactly those probabilities are though, because obviously we know that it is far more likely and frequent to obtain a MIXED-TYPE of sequence than a STREAK-TYPE when rolling dice. There are only six possible streak-sequences that can even be rolled. How many possible mixed or randomized sequences do you think can be rolled? (Rhetorical question)
Does that help any?
Log in to reply
The MIXED or STREAK property of the sequences isn't really the issue here, you are just taking a property of a minority of all the sequences that includes the first one but not the second one.
In the same way you could say: It is far less likely to roll a sequence that starts with a 6 than to roll a sequence that doesn't start with a 6 (5 times less likely). Because sequence (B) belongs to the set of sequences which start with a 6, which has a lot less elements than the set of sequences that don't start with a 6, then sequence (B) is much less likely to be the actual sequence rolled.
The real issue here is the fact that most people would agree that sequence (B) seems more random than sequence (A). The way the problem is stated, one of the sequences is random (generated by rolling a die) and the other is chosen by a person, and basically the question "Which sequence is more likely to be the one I actually rolled?" is the same as asking "Which is these numbers was chosen by me, and which one is random". It seems way more likely that sequence A was chosen rather than random, but still, it's arguable. Who knows, maybe sequence B is a has a very special meaning to you that has nothing to do with dice and that's why you chose it. As Calvin Lin said, what sequence you chose is very dependent on the "I" of the question, the person choosing the sequence and asking the question.
Log in to reply
@Andrés Castillo – Very well put. In my opinion, this question is more a test of psychology than a real test of one's mathematical abilities.
Log in to reply
@Trevor B. – "To convince doubting readers, I have, in fact, rolled a die 20 times and noted the result, digit by digit. It was either: (a) 11111111111111111111; or (b) 63335643331622221214. Do you still believe that the two series are equally likely to be what I rolled? Probably not! One of them is handwritten on a slip of paper in front of me, and I’m sure readers know that (b) was the result." -Marilyn vos Savant
Note: Sequence (B) here is NOT the same as sequence (B) in original problem. This was a genuine exercise that took place, not a hypothetical.
Log in to reply
@Taylor Shobe – I'm sorry, but Marilyn vos Savant is flat-out wrong. While it is more likely to get a "mixed" set of rolls (the probability is 6 2 0 6 2 0 − 6 ) the probability that any one particular set of rolls comes up is 6 2 0 1 no matter what the sequence is!
Consider this - If I flip a fair coin and get heads 2 0 1 5 times in a row, what is the probability that I will get tails on the next roll because the coin, knowing how many times it has landed heads, wants to compensate and start landing on tails more often.
Problem Loading...
Note Loading...
Set Loading...
Both of them got the same probabilities of getting each sequence. To get them, we would need to get the exact number 20 times in a row. The chances are 1/6 to the 20th power or roughly 1 in 3.65 quintillions: that's a 3.65 followed by 15 zeros!