BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.

First of all , it is NOT a question for a help.

 

Last night, I watched an intesting vedio .

A magician hold a black package in which there are 24 balls, 8 is red, 8 is blue, 8 is yellow.

He asked an audience to pull out 12 balls from his black package.

If the color balls conform to 3:4:5 ,then audience should pay him 10 dollars .

If it was others, he pay this audience 10 dollars .

The intesting thing happened, most of audience obtained 3:4:5 ,and this magician earn many money.

 

Why? In general, I would expect that probability should be 4:4:4 ,since it was randomly picked up .

Just to confirm this result , I wrote some IML code to simulate many times to see how the proability  distributed. This code is motivated by  @Rick_SAS   blog(which  conform to Hypergeometric distribution):

https://blogs.sas.com/content/iml/2015/09/30/balls-and-urns1.html

https://blogs.sas.com/content/iml/2015/10/14/sim-2x2-model.html

 

 


proc iml;
call randseed(123);
/* simulate 1000000 times */
n=1000000; 

want=j(n,3,.);
do i=1 to n;
/*24 balls in the black package.8 is red,8 is blue,8 is yellow*/
expect={8 8 8}; sum=sum(expect);  
/*pull out 12 balls*/
want_sum=12;  

 /*get the red balls*/
if want_sum=0 then want[i,1]=0;
else do;
 call randgen(x, "Hyper", sum, expect[1], want_sum);
 want[i,1]= x;
 sum=sum-expect[1];  
 want_sum=want_sum-x;
end;

 /*get the blue balls*/
if want_sum=0 then want[i,2]=0;
else do;
 call randgen(x, "Hyper", sum, expect[2], want_sum);
 want[i,2]= x;
 sum=sum-expect[2];  
 want_sum=want_sum-x;
end;

 /*get the yellow balls*/
if want_sum=0 then want[i,3]=0;
else do;
 call randgen(x, "Hyper", sum, expect[3], want_sum);
 want[i,3]= x;
 sum=sum-expect[3];  
 want_sum=want_sum-x;
end;

end;

create want from want;
append from want;
close;
quit;

data want2;
 set want;
 length flag $ 8;
 call sort(of _numeric_);
 flag=catx(':',of _numeric_);
run;

proc freq data=want2 order=freq;
table flag/plots=freqplot(orient=horizontal scale=percent);
run;

Ksharp_0-1743492247832.png

 

 

After simulating one million times , I got this graph. You can see 3:4:5 occurred almost  50 percent ,that is a big huge hit.

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @Ksharp,

 

I think the key point here is that the outcome "3:4:5" is actually "3:4:5 or 3:5:4 or 4:3:5 or 4:5:3 or 5:3:4 or 5:4:3". While each of these six events is less likely to occur than "4:4:4" -- which is the most probable combination (so your intuition is right) --, the sum of those six probabilities (hence the probability of obtaining a 3:4:5 ratio, regardless of which colors correspond to "3", "4" and "5"), 6*0.0811787...=0.487072..., is much greater than the probability of the single event "4:4:4", 0.126841....

 

The probabilities can be computed easily from the multivariate hypergeometric distribution:

 

%let K1=8;
%let K2=8;
%let K3=8;
%let n=12;

data prob(drop=i j);
length ratio $11;
do k1=0 to &K1;
  do k2=0 to &K2;
    do k3=0 to &K3;
      if sum(of k:)=&n then do;
        i=min(of k:);
        j=max(of k:);
        ratio=catx(':',i,&n-i-j,j);
        p=comb(&K1,k1)*comb(&K2,k2)*comb(&K3,k3)/comb(&K1+&K2+&K3,&n);
        output;
      end;
    end;
  end;
end;
run;

proc summary data=prob nway;
class ratio;
var p;
output out=want(drop=_:) sum=;
run;

proc print data=want;
sum p;
run;

Result:

Obs    ratio       p

  1    0:4:8    0.00016
  2    0:5:7    0.00099
  3    0:6:6    0.00087
  4    1:3:8    0.00099
  5    1:4:7    0.00994
  6    1:5:6    0.02783
  7    2:2:8    0.00087
  8    2:3:7    0.02783
  9    2:4:6    0.12177
 10    2:5:5    0.09741
 11    3:3:6    0.09741
 12    3:4:5    0.48707
 13    4:4:4    0.12684
                =======
                1.00000

 

View solution in original post

14 REPLIES 14
FreelanceReinh
Jade | Level 19

Hello @Ksharp,

 

I think the key point here is that the outcome "3:4:5" is actually "3:4:5 or 3:5:4 or 4:3:5 or 4:5:3 or 5:3:4 or 5:4:3". While each of these six events is less likely to occur than "4:4:4" -- which is the most probable combination (so your intuition is right) --, the sum of those six probabilities (hence the probability of obtaining a 3:4:5 ratio, regardless of which colors correspond to "3", "4" and "5"), 6*0.0811787...=0.487072..., is much greater than the probability of the single event "4:4:4", 0.126841....

 

The probabilities can be computed easily from the multivariate hypergeometric distribution:

 

%let K1=8;
%let K2=8;
%let K3=8;
%let n=12;

data prob(drop=i j);
length ratio $11;
do k1=0 to &K1;
  do k2=0 to &K2;
    do k3=0 to &K3;
      if sum(of k:)=&n then do;
        i=min(of k:);
        j=max(of k:);
        ratio=catx(':',i,&n-i-j,j);
        p=comb(&K1,k1)*comb(&K2,k2)*comb(&K3,k3)/comb(&K1+&K2+&K3,&n);
        output;
      end;
    end;
  end;
end;
run;

proc summary data=prob nway;
class ratio;
var p;
output out=want(drop=_:) sum=;
run;

proc print data=want;
sum p;
run;

Result:

Obs    ratio       p

  1    0:4:8    0.00016
  2    0:5:7    0.00099
  3    0:6:6    0.00087
  4    1:3:8    0.00099
  5    1:4:7    0.00994
  6    1:5:6    0.02783
  7    2:2:8    0.00087
  8    2:3:7    0.02783
  9    2:4:6    0.12177
 10    2:5:5    0.09741
 11    3:3:6    0.09741
 12    3:4:5    0.48707
 13    4:4:4    0.12684
                =======
                1.00000

 

Ksharp
Super User
Great! It is awesome !
yabwon
Amethyst | Level 16

@FreelanceReinh 

Looks like we went the same path 😄

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



quickbluefish
Barite | Level 11
This is interesting. I guess it's because each time you pull a ball of color X, you're changing the probability you'll pull another of X. So the 2nd pull is slightly more likely to be Y or Z and the third pull is more likely to be Z? If there were infinite balls, then I guess you would in fact get 4:4:4. One thing, though - it looks like the magician would have lost money since the probability of 3:4:5 is slightly less then 50%.
Ksharp
Super User
Yeah.
As a matter of fact , magician have a list to claim:
3:4:5 get -10
4:4:4 get 0
2:4:6 get 10
...........
yabwon
Amethyst | Level 16

Just to be clear: probability is not my super-power but:

 

Ain't that case of multivariate hypergeometric distribution?

https://en.wikipedia.org/wiki/Hypergeometric_distribution#Multivariate_hypergeometric_distribution

and the fact that:

P(3,4,5) is independent from P(3,5,4), etc,

P(3,4,5 or 3,5,4) = P(3,4,5) + P(3,5,4) - P(3,4,5 and 3,5,4) = P(3,4,5) + P(3,5,4)

so probability P("I have 3:4:5 sequence") would be just sum of P(3,4,5) + P(3,5,4) + ...+ P(5,4,3).

 

 


data have;
input grp $ r g b;
cards;
a 4 4 4
b 3 4 5
b 3 5 4
b 4 3 5
b 4 5 3
b 5 3 4
b 5 4 3
;
run;

/*
a = get all 4
b = get 3:4:5
*/

data want;
rn=8;
gn=8;
bn=8;
N=rn+gn+bn;

set have;
k=r+g+b;

p = /* https://en.wikipedia.org/wiki/Hypergeometric_distribution#Multivariate_hypergeometric_distribution */
(comb(rn, r)
*comb(gn, g)
*comb(bn, b)
)
/comb(n, k)
;

put (_ALL_) (=/);

if _N_=2 then sum=0;
sum+p;
run;
proc print;
run;

 

[EDIT:] proc print:

yabwon_0-1743501007026.png

 

 

 

Bart

 

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Rick_SAS
SAS Super FREQ

The key is that you are pulling WITHOUT REPLACEMENT, so the conditional probabilities change as you pull. The expected ratio is only 4:4:4 is you pull with replacement. You can do a simpler analysis of 6 balls (3R, 3B, 3Y) and pulling 3 balls to see that the ratios are not even. If I did the analysis correctly, in that situation there is a 70% chance that the ratio is 2:1:0 and only a 30% chance that the ratio is 1:1:1.

 

I think you can perform the simulation more concisely if you use the SAMPLE function to pull the samples and use the TABULATE subroutine to analyze the ratio:

proc iml;
call randseed(123);
/* simulate 1000000 times */
n=1000000; 
/* create binary variable equal to 1 if the ratio is 3:4:5 */
is_345 = j(n,1,0);

set = repeat('R',8) // repeat('G',8) // repeat('Y',8);
draws = sample(set, 12 //n, "NoReplace");  /* draw n samples of size 12 */

do i = 1 to n;
   call tabulate(labels, freq, draws[i,]);
   if ncol(freq)=3 then 
      is_345[i] = all(freq={3 4 5}) | all(freq={3 5 4}) | 
                  all(freq={4 3 5}) | all(freq={5 3 4}) | 
                  all(freq={4 5 3}) | all(freq={5 4 3}) ;
end;
/* estimate proportion of draws that result in 3:4:5 */
prop_345 = mean(is_345);
print prop_345;

In this simulation, the result is 0.4867 of the draws have ratio 3:4:5.  Since this is less than 50%, the magician must have been using "magic" (that is, cheating) to win money.

 

yabwon
Amethyst | Level 16

If someone doesn't have IML, just for fun, 4GL version (there is a bunch of code commented out from my try-and-errors for optimisation, for those who like to test):  

 

/* arrays simulation */
options mprint;
data _null_;
  array bwb[24] _temporary_ (8*(1 2 3)); /* template box with balls */
  array b[24]   _temporary_;             /* box from which we are selecting */
  array r[3] ;                           /* couter for selected color 1, 2, 3 */
  call streaminit(42);                   /* set seed = 42 */

  array x[0:12,0:12,0:12] _temporary_; /* array for results, 0=color was never selected */

  do iter=0 to 1e6-1; /* iterations */

  /* populate b with balls */
  /*
    do n=1 to 24;
      b[n]=bwb[n];
    end;
  *//*
   call pokelong (
      put (PEEKCLONG (ADDRLONG(bwb[1]), 192), $192.)
    , ADDRLONG(b[1])
    , 192);
  *//**/
    %macro loop(n);
      %do n=1 %to &n.;
        b[&n.]=bwb[&n.];
      %end;
    %mend loop;
    %loop(24)
  /**/

    r1=0;r2=0;r3=0; /* initiate counters */
    /*
    do j=24 to 13 by -1;       
      i = RAND("integer",1,j); 
      r[b[i]]+1;               
      b[i]=b[j];               
    end;
    */
    %macro loop2(n,m);
    %do n=&n. %to %eval(&n.-&m.+1) %by -1;    /* select 12 from 24 with no repetitions */
      i = RAND("integer",1,&n.); /* get random number from 1-24, 1-23, 1-22, etc. */
      r[b[i]]+1;                 /* increase count of selected ball */
      b[i]=b[&n.];               /* fill gap in "i" (after selected ball) with last ball ("n") */
    %end;
    %mend loop2;
    %loop2(24,12)

    call sortn(of r[*]); /* order counters */

    x[r1,r2,r3]+1;       /* increase selected "set" */
  end;

  /* output generated data */
  do r1=0 to 12;
    do r2=0 to 12;
      do r3=0 to 12;
        if x[r1,r2,r3] then /* print only "materialized" cases */
          do;
            n=x[r1,r2,r3]/iter; 
            output;
            put @2 r1= @12 r2= @22 r3= @32 n= best32.20;
          end;
      end;
    end;
  end;
  keep r: n;
run;

Result for seed=42 and 1e9 iterations:

 

/* 1e9

 r1=0      r2=4      r3=8      n=0.000155304
 r1=0      r2=5      r3=7      n=0.000993489
 r1=0      r2=6      r3=6      n=0.000869773
 r1=1      r2=3      r3=8      n=0.000994072
 r1=1      r2=4      r3=7      n=0.009935598
 r1=1      r2=5      r3=6      n=0.027825111
 r1=2      r2=2      r3=8      n=0.000869628
 r1=2      r2=3      r3=7      n=0.027822129
 r1=2      r2=4      r3=6      n=0.121775441
 r1=2      r2=5      r3=5      n=0.097433405
 r1=3      r2=3      r3=6      n=0.097416671
 r1=3      r2=4      r3=5      n=0.487068081
 r1=4      r2=4      r3=4      n=0.126841298
NOTE: The data set WORK.FREQCOUNT has 13 observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           11:56.18
      cpu time            11:56.14

*/

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



PaigeMiller
Diamond | Level 26

So looking at the computed probability for 3:4:5 which is 0.487, in the long run the magician should lose money looking for 3:4:5, as the rest of the possibilities add to greater than 0.5 and so are more likely.

--
Paige Miller
Ksharp
Super User
Yeah.
As a matter of fact , magician have a list to claim to make sure he can earn the money . Here I just simplify this story to have some fun.
3:4:5 get -10
4:4:4 get 0
2:4:6 get 10
...........
Ksharp
Super User
Rick,
Good. Using SAMPLE() funtion is more like reality than my RAND().
Actually, magician have a list to claim to make sure he can earn the money. I just want to simplify this story . and can't image 3:4:5 would have so big probability.
Rick_SAS
SAS Super FREQ

Right. You can see this effect even for two classes. and even for sampling with replacement. Flip a fair coin 4 times and look at the ratios of heads to tails.

  • The probability of 0:4 is 2 / 16 because HHHH and TTTT are possible.
  • The probability of 1:3 is very high at 8 / 16. It includes possibilities such a HTTT and THTT.
  • The probability of 2:2 is "only" 6 / 16. It includes possibilities such a HTTH and THHT.

So, even in that simple probability space, the unequal ratio is higher than you might initially think.

Ksharp
Super User

Here I would like to use PROC SURVEYSELECT to mimic SAMPLE() funtion in SAS/IML for sombody was not familiar with IML syntax.

 

data have;
do i=1 to 8;
 ball='Y';output;
 ball='B';output;
 ball='R';output;
end;
drop i;
run;

proc surveyselect data=have out=temp seed=123   sampsize=12 reps=1000000;
run;
proc freq data=temp noprint;
table Replicate*ball/out=temp2 sparse list;
run;
proc sort data=temp2;
by Replicate count;
run;
data want;
length flag $ 8;
do until(last.Replicate);
 set temp2;
 by Replicate;
 flag=catx(':',flag,count);
end;
keep Replicate flag;
run;
proc freq data=want order=freq;
table flag/plots=freqplot(orient=horizontal scale=percent);
run;

Ksharp_0-1743668675799.png

Ksharp_1-1743668690054.png

 

Astounding
PROC Star

Not the same thing but, it's my favorite "audience bet".  

Playig backgammon, and rolling two dice, rolling "doubles" is advantageious (1:1, 2:2, ... 6:6).

Chances of doubles on a single roll is 1 out of 6.  I claim that my opponent is exceedingly lucky and if he rolls the dice only 5 times, he will roll doubles at some point.

The math is much easier than the question with three colors of balls.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 14 replies
  • 8091 views
  • 22 likes
  • 7 in conversation