Playing with probabilities and Python

I have experienced such a "condition" recently, so I documented its stages - for fear of relapse :‑)

Last Saturday, my older brother asked me about the probability of "something" related to a lottery that is played here in Greece. The lottery is called "Joker" (and no, it has nothing to do with the Bat mobile). He described it to me as a set of balls with numbers from 1 to 40; I was told that they are thoroughly mixed in each lottery, and that 5 of them are randomly picked. Then another number (the "Joker") is randomly picked from a separate set of balls numbered from 1 to 20. If you have predicted all 6 of them, you instantly become a millionaire.

*"What is the chance that the joker is equal to one of the 5 others?"*,
I was asked.

It's been 15 years since I passed the probability and statistics exam in the University, but... like I said: "relapse" :‑)

When we calculate the odds for a *logical AND* of events, the
overall probability of the combined event is the product of the probabilities
of the events. At least, that's how I remember it...

P(A and B and C and D and E) = P(A) x P(B) x P(C) x P(D) x P(E)In this case, how can I form a "logical AND"? Err...

- A is the event of the 1st of the five not being equal to the joker
- B is the event of the 2nd of the five not being equal to the joker
- C is the event of the 3rd of the five not being equal to the joker
- D is the event of the 4th of the five not being equal to the joker
- E is the event of the 5th of the five not being equal to the joker

p(A) = 39/40 p(B) = ...Now, the first one (A) is easy. There are 40 balls, I get one - and the joker is somewhere between 1 and 20. Chances of not hitting that "target" ball are therefore 39 in 40.

The second one (B) is a tad more difficult, though: Event B says that the second ball is not equal to the joker - this means that

either event (A) took place and now, event (B) takes place or event (A) didn't take place and now, event (B) takes placeor, in simpler terms,...

either (B1) the 1st ball was not the joker and neither is the 2nd one or (B2) the 1st ball was equal to the joker and the 2nd one is notLet's see...

B can only happen if one of (B1, B2) takes place, so if I remember correctly...

P(B) = P(B1 or B2) = P(B1) + P(B2) - P(B1 and B2)However, B1

Which means that all I need is find out P(B1) and P(B2).

P(B1) = P(1stEscapesJoker) x P(2ndEscapesJoker given that 1stAlsoEscaped) = 39/40 x 38/39The second term is 38/39, not 38/40 and not 39/40: the 5 chosen balls are all different; now that we've picked the first one and it was not equal to the joker, only 38 out of the remaining 39 are not equal to neither the joker nor the first one.

Similarly,...

P(B2) = P(1stIsJoker) x P(2ndEscapesJoker given that 1stIsJoker) = 1/40 x 39/39The second term here is "absolute certainty": with the first ball equal to the joker, the second one can not possibly be equal to the joker (since it can not be equal to the first ball!). All remaining 39 balls are therefore not equal to the joker.

So, what is P(B)?

P(B) = P(B1) + P(B2) - P(B1 and B2) = 39 38 1 39 = -- x -- + -- x -- - 0 = 40 39 40 39 38 1 39 = -- + -- = -- 40 40 40

The chances of the second ball "escaping" the joker are the same as the chances of the first ball! Hmm...

Let's do it once more, for P(C):

P(C) = P(third ball not equal to the joker) = = P(C1 or C2 or C3) = = P(C1) + P(C2) + P(C3) - - P(C1 and C2) - P(C2 and C3) - P(C1 and C3) + + P(C1 and C2 and C3) Where: C1 = first escapes, second escapes, third escapes C2 = first escapes, second matches, third escapes C3 = first matches, second escapes, third escapes and therefore,... 39 38 37 37 P(C1) = -- x -- x -- = -- 40 39 38 40 39 1 38 1 P(C2) = -- x -- x -- = -- 40 39 38 40 1 39 38 1 P(C3) = -- x -- x -- = -- 40 39 38 40 which, finally, leads us to... P(C) = P(C1) + P(C2) + P(C3) - - P(C1 and C2) - P(C2 and C3) - P(C1 and C3) + + P(C1 and C2 and C3) = 37 1 1 = -- + -- + -- - 0 - 0 - 0 + 0 = 40 40 40 39 = -- 40At this point, it indeed appears that the chances of escaping the Joker are constant (39/40) regardless of which one of the 5 balls we are looking at. Think about it: even if we were picking 40 balls instead of 5, the 40th ball has the same overall chance of escaping the joker as does the first one! Not immediately apparent, but makes sense if you think about it...

So, what is the chance of **ALL** 5 balls escaping the joker (the "logical AND")? The product of the 5
probabilities, so...

39 5 P(all 5 escaping) = ( -- ) 40...and the chances of one of them being equal to the joker, is therefore,

P(one or more of them equal to joker) = 1 - P(all 5 escaping) = 1 - (39/40)^5 = .1189043Looks good. Illusions of grandeur rapidly manifesting :‑)

Now let's put the theory to the test - Python to the rescue!

#!/usr/bin/env python success = 0 tries = 0 for joker in xrange(1, 21): # From 1 to 20 for first in xrange(1, 41): # From 1 to 40 for second in xrange(1, 41): # From 1 to 40 if first == second: continue # Ignore invalid lotteries if first != joker and second != joker: success += 1 tries += 1 print float(success)/tries

P(A and B) = P(A) x P(B) = (39/40)^2 = 0.950625

Why?bash$python experiment1.py 0.95

*Don't read further down immediately - think this through, reread...
treat this as a puzzle!*

No, it is not a Python "floating point precision" issue...

Let's try to see what Python gives us for each of P(A) and P(B) (i.e. the chance of the first one escaping the joker, and - separately - the chance of the second one escaping the joker):

#!/usr/bin/env python success1 = 0 success2 = 0 tries = 0 for joker in xrange(1, 21): for first in xrange(1, 41): for second in xrange(1, 41): if first == second: continue if first != joker: success1 += 1 if second != joker: success2 += 1 tries += 1 print float(success1)/tries print float(success2)/tries

At least this one looks good (0.975 = 39/40). Both the first one and the second one have the same chance of escaping the joker, exactly 39/40. Not all our theory is wrong... We correctly calculated the probabilities of each one of the 5 balls escaping the joker. But... why is the end result for the overall question different?bash$python experiment2.py 0.975 0.975

Hmm...

P(A and B) = P(A) x P(B) = (39/40)^2 = 0.950625In other words, the real world experiment tells us that our theoretically calculated probability is slightly higher than the real one. Why, though? Isn't the probability of the

Time to dig up the textbooks... (boxes, dust, cough)

There, I found it...

Oops - after 15 years I knew I would have missed something.

Independent?

They are most definitely NOT independent! Whether the second one escapes the joker VERY MUCH DEPENDS on whether the first one did! If the first one was equal to the joker, the second one SURELY will NOT (since it cannot be equal to the first one)! Don't get confused by the fact that we calculated the probabilities of each event and found them to be the same; the important point is that the outcome of the first event has implications on the second one.

Now what?

Wait a minute... By knowing P(A) and P(B), we also know P(not A) and P(not B)...

P(first is equal to joker) = P(not A) = 1-P(A) = 1/40 P(second is equal to joker) = P(not B) = 1-P(B) = 1/40... and therefore ...

P(1stIsJoker or 2ndIsJoker) = P(1stIsJoker) + P(2ndIsJoker) - P(1stIsJoker and 2ndIsJoker)The last term (1stIsJoker and 2ndIsJoker) can't happen - the first and second ball are never equal!

Finally, I got it!

P(1stIsJoker or 2ndIsJoker) = P(1stIsJoker) + P(2ndIsJoker) = 1/40 + 1/40 = 2/40And yes, ...

P(both first and second escape) = 1-P(1stIsJoker or 2ndIsJoker) = 1-(2/40) = 0.95... as experiment1 showed.

When I was calculating B1...

P(firstEscapes and secondEscapes) = P(B1) = (see above in Theory section) 39 38 38 = -- x -- = -- = 0.95 40 39 40... I could have just extended this to get to the result easily...

With this alone, I could have seen that with 2 balls, the probability of joker "hitting" one of the two balls is 2/40 (1-(38/40)). For the complete game (5 balls) as questioned by my brother, the chance is 5/40 (or 1/8), since...

p(1st and 2nd and 3rd and 4th and 5th escape the joker) = p(1stEscapes) x p(2ndEscapes given 1stEscaped) x p(3rdEscapes given 1stAnd2ndEscaped) x p(4thEscapes given 1stAnd2ndAnd3rdEscaped) x p(5thEscapes given 1stAnd2ndAnd3rdAnd4thEscaped) = 39 38 37 36 35 35 = -- x -- x -- x -- x -- = -- 40 39 38 37 36 40 and thus, the answer to our problem is... 35 5 P(one of the 5 is equal to the joker) = 1 - -- = -- 40 40Based on the above, I advised my brother as follows:

- If you do play the Joker lottery, don't select as a joker one of the 5 other numbers you play - or better yet, don't pay the "stupidity tax" that is all lotteries.
- Don't ask engineers scientific questions! They are VERY susceptible to "me-almost-scientist" relapses... And they will feel illusions of grandeur in the process - have mercy :‑)

Index | CV | Updated: Sat Mar 8 22:58:16 2014 |