- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
First of all , it is NOT a question for a help.
Last night, I watched an intesting vedio .
A magician hold a black package in which there are 24 balls, 8 is red, 8 is blue, 8 is yellow.
He asked an audience to pull out 12 balls from his black package.
If the color balls conform to 3:4:5 ,then audience should pay him 10 dollars .
If it was others, he pay this audience 10 dollars .
The intesting thing happened, most of audience obtained 3:4:5 ,and this magician earn many money.
Why? In general, I would expect that probability should be 4:4:4 ,since it was randomly picked up .
Just to confirm this result , I wrote some IML code to simulate many times to see how the proability distributed. This code is motivated by @Rick_SAS blog(which conform to Hypergeometric distribution):
https://blogs.sas.com/content/iml/2015/09/30/balls-and-urns1.html
https://blogs.sas.com/content/iml/2015/10/14/sim-2x2-model.html
proc iml;
call randseed(123);
/* simulate 1000000 times */
n=1000000;
want=j(n,3,.);
do i=1 to n;
/*24 balls in the black package.8 is red,8 is blue,8 is yellow*/
expect={8 8 8}; sum=sum(expect);
/*pull out 12 balls*/
want_sum=12;
/*get the red balls*/
if want_sum=0 then want[i,1]=0;
else do;
call randgen(x, "Hyper", sum, expect[1], want_sum);
want[i,1]= x;
sum=sum-expect[1];
want_sum=want_sum-x;
end;
/*get the blue balls*/
if want_sum=0 then want[i,2]=0;
else do;
call randgen(x, "Hyper", sum, expect[2], want_sum);
want[i,2]= x;
sum=sum-expect[2];
want_sum=want_sum-x;
end;
/*get the yellow balls*/
if want_sum=0 then want[i,3]=0;
else do;
call randgen(x, "Hyper", sum, expect[3], want_sum);
want[i,3]= x;
sum=sum-expect[3];
want_sum=want_sum-x;
end;
end;
create want from want;
append from want;
close;
quit;
data want2;
set want;
length flag $ 8;
call sort(of _numeric_);
flag=catx(':',of _numeric_);
run;
proc freq data=want2 order=freq;
table flag/plots=freqplot(orient=horizontal scale=percent);
run;
After simulating one million times , I got this graph. You can see 3:4:5 occurred almost 50 percent ,that is a big huge hit.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @Ksharp,
I think the key point here is that the outcome "3:4:5" is actually "3:4:5 or 3:5:4 or 4:3:5 or 4:5:3 or 5:3:4 or 5:4:3". While each of these six events is less likely to occur than "4:4:4" -- which is the most probable combination (so your intuition is right) --, the sum of those six probabilities (hence the probability of obtaining a 3:4:5 ratio, regardless of which colors correspond to "3", "4" and "5"), 6*0.0811787...=0.487072..., is much greater than the probability of the single event "4:4:4", 0.126841....
The probabilities can be computed easily from the multivariate hypergeometric distribution:
%let K1=8;
%let K2=8;
%let K3=8;
%let n=12;
data prob(drop=i j);
length ratio $11;
do k1=0 to &K1;
do k2=0 to &K2;
do k3=0 to &K3;
if sum(of k:)=&n then do;
i=min(of k:);
j=max(of k:);
ratio=catx(':',i,&n-i-j,j);
p=comb(&K1,k1)*comb(&K2,k2)*comb(&K3,k3)/comb(&K1+&K2+&K3,&n);
output;
end;
end;
end;
end;
run;
proc summary data=prob nway;
class ratio;
var p;
output out=want(drop=_:) sum=;
run;
proc print data=want;
sum p;
run;
Result:
Obs ratio p 1 0:4:8 0.00016 2 0:5:7 0.00099 3 0:6:6 0.00087 4 1:3:8 0.00099 5 1:4:7 0.00994 6 1:5:6 0.02783 7 2:2:8 0.00087 8 2:3:7 0.02783 9 2:4:6 0.12177 10 2:5:5 0.09741 11 3:3:6 0.09741 12 3:4:5 0.48707 13 4:4:4 0.12684 ======= 1.00000
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @Ksharp,
I think the key point here is that the outcome "3:4:5" is actually "3:4:5 or 3:5:4 or 4:3:5 or 4:5:3 or 5:3:4 or 5:4:3". While each of these six events is less likely to occur than "4:4:4" -- which is the most probable combination (so your intuition is right) --, the sum of those six probabilities (hence the probability of obtaining a 3:4:5 ratio, regardless of which colors correspond to "3", "4" and "5"), 6*0.0811787...=0.487072..., is much greater than the probability of the single event "4:4:4", 0.126841....
The probabilities can be computed easily from the multivariate hypergeometric distribution:
%let K1=8;
%let K2=8;
%let K3=8;
%let n=12;
data prob(drop=i j);
length ratio $11;
do k1=0 to &K1;
do k2=0 to &K2;
do k3=0 to &K3;
if sum(of k:)=&n then do;
i=min(of k:);
j=max(of k:);
ratio=catx(':',i,&n-i-j,j);
p=comb(&K1,k1)*comb(&K2,k2)*comb(&K3,k3)/comb(&K1+&K2+&K3,&n);
output;
end;
end;
end;
end;
run;
proc summary data=prob nway;
class ratio;
var p;
output out=want(drop=_:) sum=;
run;
proc print data=want;
sum p;
run;
Result:
Obs ratio p 1 0:4:8 0.00016 2 0:5:7 0.00099 3 0:6:6 0.00087 4 1:3:8 0.00099 5 1:4:7 0.00994 6 1:5:6 0.02783 7 2:2:8 0.00087 8 2:3:7 0.02783 9 2:4:6 0.12177 10 2:5:5 0.09741 11 3:3:6 0.09741 12 3:4:5 0.48707 13 4:4:4 0.12684 ======= 1.00000
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Looks like we went the same path 😄
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug
"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings
SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As a matter of fact , magician have a list to claim:
3:4:5 get -10
4:4:4 get 0
2:4:6 get 10
...........
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Just to be clear: probability is not my super-power but:
Ain't that case of multivariate hypergeometric distribution?
https://en.wikipedia.org/wiki/Hypergeometric_distribution#Multivariate_hypergeometric_distribution
and the fact that:
P(3,4,5) is independent from P(3,5,4), etc,
P(3,4,5 or 3,5,4) = P(3,4,5) + P(3,5,4) - P(3,4,5 and 3,5,4) = P(3,4,5) + P(3,5,4)
so probability P("I have 3:4:5 sequence") would be just sum of P(3,4,5) + P(3,5,4) + ...+ P(5,4,3).
data have;
input grp $ r g b;
cards;
a 4 4 4
b 3 4 5
b 3 5 4
b 4 3 5
b 4 5 3
b 5 3 4
b 5 4 3
;
run;
/*
a = get all 4
b = get 3:4:5
*/
data want;
rn=8;
gn=8;
bn=8;
N=rn+gn+bn;
set have;
k=r+g+b;
p = /* https://en.wikipedia.org/wiki/Hypergeometric_distribution#Multivariate_hypergeometric_distribution */
(comb(rn, r)
*comb(gn, g)
*comb(bn, b)
)
/comb(n, k)
;
put (_ALL_) (=/);
if _N_=2 then sum=0;
sum+p;
run;
proc print;
run;
[EDIT:] proc print:
Bart
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug
"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings
SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The key is that you are pulling WITHOUT REPLACEMENT, so the conditional probabilities change as you pull. The expected ratio is only 4:4:4 is you pull with replacement. You can do a simpler analysis of 6 balls (3R, 3B, 3Y) and pulling 3 balls to see that the ratios are not even. If I did the analysis correctly, in that situation there is a 70% chance that the ratio is 2:1:0 and only a 30% chance that the ratio is 1:1:1.
I think you can perform the simulation more concisely if you use the SAMPLE function to pull the samples and use the TABULATE subroutine to analyze the ratio:
proc iml;
call randseed(123);
/* simulate 1000000 times */
n=1000000;
/* create binary variable equal to 1 if the ratio is 3:4:5 */
is_345 = j(n,1,0);
set = repeat('R',8) // repeat('G',8) // repeat('Y',8);
draws = sample(set, 12 //n, "NoReplace"); /* draw n samples of size 12 */
do i = 1 to n;
call tabulate(labels, freq, draws[i,]);
if ncol(freq)=3 then
is_345[i] = all(freq={3 4 5}) | all(freq={3 5 4}) |
all(freq={4 3 5}) | all(freq={5 3 4}) |
all(freq={4 5 3}) | all(freq={5 4 3}) ;
end;
/* estimate proportion of draws that result in 3:4:5 */
prop_345 = mean(is_345);
print prop_345;
In this simulation, the result is 0.4867 of the draws have ratio 3:4:5. Since this is less than 50%, the magician must have been using "magic" (that is, cheating) to win money.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If someone doesn't have IML, just for fun, 4GL version (there is a bunch of code commented out from my try-and-errors for optimisation, for those who like to test):
/* arrays simulation */
options mprint;
data _null_;
array bwb[24] _temporary_ (8*(1 2 3)); /* template box with balls */
array b[24] _temporary_; /* box from which we are selecting */
array r[3] ; /* couter for selected color 1, 2, 3 */
call streaminit(42); /* set seed = 42 */
array x[0:12,0:12,0:12] _temporary_; /* array for results, 0=color was never selected */
do iter=0 to 1e6-1; /* iterations */
/* populate b with balls */
/*
do n=1 to 24;
b[n]=bwb[n];
end;
*//*
call pokelong (
put (PEEKCLONG (ADDRLONG(bwb[1]), 192), $192.)
, ADDRLONG(b[1])
, 192);
*//**/
%macro loop(n);
%do n=1 %to &n.;
b[&n.]=bwb[&n.];
%end;
%mend loop;
%loop(24)
/**/
r1=0;r2=0;r3=0; /* initiate counters */
/*
do j=24 to 13 by -1;
i = RAND("integer",1,j);
r[b[i]]+1;
b[i]=b[j];
end;
*/
%macro loop2(n,m);
%do n=&n. %to %eval(&n.-&m.+1) %by -1; /* select 12 from 24 with no repetitions */
i = RAND("integer",1,&n.); /* get random number from 1-24, 1-23, 1-22, etc. */
r[b[i]]+1; /* increase count of selected ball */
b[i]=b[&n.]; /* fill gap in "i" (after selected ball) with last ball ("n") */
%end;
%mend loop2;
%loop2(24,12)
call sortn(of r[*]); /* order counters */
x[r1,r2,r3]+1; /* increase selected "set" */
end;
/* output generated data */
do r1=0 to 12;
do r2=0 to 12;
do r3=0 to 12;
if x[r1,r2,r3] then /* print only "materialized" cases */
do;
n=x[r1,r2,r3]/iter;
output;
put @2 r1= @12 r2= @22 r3= @32 n= best32.20;
end;
end;
end;
end;
keep r: n;
run;
Result for seed=42 and 1e9 iterations:
/* 1e9 r1=0 r2=4 r3=8 n=0.000155304 r1=0 r2=5 r3=7 n=0.000993489 r1=0 r2=6 r3=6 n=0.000869773 r1=1 r2=3 r3=8 n=0.000994072 r1=1 r2=4 r3=7 n=0.009935598 r1=1 r2=5 r3=6 n=0.027825111 r1=2 r2=2 r3=8 n=0.000869628 r1=2 r2=3 r3=7 n=0.027822129 r1=2 r2=4 r3=6 n=0.121775441 r1=2 r2=5 r3=5 n=0.097433405 r1=3 r2=3 r3=6 n=0.097416671 r1=3 r2=4 r3=5 n=0.487068081 r1=4 r2=4 r3=4 n=0.126841298 NOTE: The data set WORK.FREQCOUNT has 13 observations and 4 variables. NOTE: DATA statement used (Total process time): real time 11:56.18 cpu time 11:56.14 */
Bart
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug
"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings
SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So looking at the computed probability for 3:4:5 which is 0.487, in the long run the magician should lose money looking for 3:4:5, as the rest of the possibilities add to greater than 0.5 and so are more likely.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As a matter of fact , magician have a list to claim to make sure he can earn the money . Here I just simplify this story to have some fun.
3:4:5 get -10
4:4:4 get 0
2:4:6 get 10
...........
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Good. Using SAMPLE() funtion is more like reality than my RAND().
Actually, magician have a list to claim to make sure he can earn the money. I just want to simplify this story . and can't image 3:4:5 would have so big probability.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Right. You can see this effect even for two classes. and even for sampling with replacement. Flip a fair coin 4 times and look at the ratios of heads to tails.
- The probability of 0:4 is 2 / 16 because HHHH and TTTT are possible.
- The probability of 1:3 is very high at 8 / 16. It includes possibilities such a HTTT and THTT.
- The probability of 2:2 is "only" 6 / 16. It includes possibilities such a HTTH and THHT.
So, even in that simple probability space, the unequal ratio is higher than you might initially think.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here I would like to use PROC SURVEYSELECT to mimic SAMPLE() funtion in SAS/IML for sombody was not familiar with IML syntax.
data have;
do i=1 to 8;
ball='Y';output;
ball='B';output;
ball='R';output;
end;
drop i;
run;
proc surveyselect data=have out=temp seed=123 sampsize=12 reps=1000000;
run;
proc freq data=temp noprint;
table Replicate*ball/out=temp2 sparse list;
run;
proc sort data=temp2;
by Replicate count;
run;
data want;
length flag $ 8;
do until(last.Replicate);
set temp2;
by Replicate;
flag=catx(':',flag,count);
end;
keep Replicate flag;
run;
proc freq data=want order=freq;
table flag/plots=freqplot(orient=horizontal scale=percent);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Not the same thing but, it's my favorite "audience bet".
Playig backgammon, and rolling two dice, rolling "doubles" is advantageious (1:1, 2:2, ... 6:6).
Chances of doubles on a single roll is 1 out of 6. I claim that my opponent is exceedingly lucky and if he rolls the dice only 5 times, he will roll doubles at some point.
The math is much easier than the question with three colors of balls.