SAS/IML Software and Matrix Computations

Statistical programming, matrix languages, and more
BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.

First of all , it is NOT a question for a help.

 

Last night, I watched an intesting vedio .

A magician hold a black package in which there are 24 balls, 8 is red, 8 is blue, 8 is yellow.

He asked an audience to pull out 12 balls from his black package.

If the color balls conform to 3:4:5 ,then audience should pay him 10 dollars .

If it was others, he pay this audience 10 dollars .

The intesting thing happened, most of audience obtained 3:4:5 ,and this magician earn many money.

 

Why? In general, I would expect that probability should be 4:4:4 ,since it was randomly picked up .

Just to confirm this result , I wrote some IML code to simulate many times to see how the proability  distributed. This code is motivated by  @Rick_SAS   blog(which  conform to Hypergeometric distribution):

https://blogs.sas.com/content/iml/2015/09/30/balls-and-urns1.html

https://blogs.sas.com/content/iml/2015/10/14/sim-2x2-model.html

 

 


proc iml;
call randseed(123);
/* simulate 1000000 times */
n=1000000; 

want=j(n,3,.);
do i=1 to n;
/*24 balls in the black package.8 is red,8 is blue,8 is yellow*/
expect={8 8 8}; sum=sum(expect);  
/*pull out 12 balls*/
want_sum=12;  

 /*get the red balls*/
if want_sum=0 then want[i,1]=0;
else do;
 call randgen(x, "Hyper", sum, expect[1], want_sum);
 want[i,1]= x;
 sum=sum-expect[1];  
 want_sum=want_sum-x;
end;

 /*get the blue balls*/
if want_sum=0 then want[i,2]=0;
else do;
 call randgen(x, "Hyper", sum, expect[2], want_sum);
 want[i,2]= x;
 sum=sum-expect[2];  
 want_sum=want_sum-x;
end;

 /*get the yellow balls*/
if want_sum=0 then want[i,3]=0;
else do;
 call randgen(x, "Hyper", sum, expect[3], want_sum);
 want[i,3]= x;
 sum=sum-expect[3];  
 want_sum=want_sum-x;
end;

end;

create want from want;
append from want;
close;
quit;

data want2;
 set want;
 length flag $ 8;
 call sort(of _numeric_);
 flag=catx(':',of _numeric_);
run;

proc freq data=want2 order=freq;
table flag/plots=freqplot(orient=horizontal scale=percent);
run;

Ksharp_0-1743492247832.png

 

 

After simulating one million times , I got this graph. You can see 3:4:5 occurred almost  50 percent ,that is a big huge hit.

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @Ksharp,

 

I think the key point here is that the outcome "3:4:5" is actually "3:4:5 or 3:5:4 or 4:3:5 or 4:5:3 or 5:3:4 or 5:4:3". While each of these six events is less likely to occur than "4:4:4" -- which is the most probable combination (so your intuition is right) --, the sum of those six probabilities (hence the probability of obtaining a 3:4:5 ratio, regardless of which colors correspond to "3", "4" and "5"), 6*0.0811787...=0.487072..., is much greater than the probability of the single event "4:4:4", 0.126841....

 

The probabilities can be computed easily from the multivariate hypergeometric distribution:

 

%let K1=8;
%let K2=8;
%let K3=8;
%let n=12;

data prob(drop=i j);
length ratio $11;
do k1=0 to &K1;
  do k2=0 to &K2;
    do k3=0 to &K3;
      if sum(of k:)=&n then do;
        i=min(of k:);
        j=max(of k:);
        ratio=catx(':',i,&n-i-j,j);
        p=comb(&K1,k1)*comb(&K2,k2)*comb(&K3,k3)/comb(&K1+&K2+&K3,&n);
        output;
      end;
    end;
  end;
end;
run;

proc summary data=prob nway;
class ratio;
var p;
output out=want(drop=_:) sum=;
run;

proc print data=want;
sum p;
run;

Result:

Obs    ratio       p

  1    0:4:8    0.00016
  2    0:5:7    0.00099
  3    0:6:6    0.00087
  4    1:3:8    0.00099
  5    1:4:7    0.00994
  6    1:5:6    0.02783
  7    2:2:8    0.00087
  8    2:3:7    0.02783
  9    2:4:6    0.12177
 10    2:5:5    0.09741
 11    3:3:6    0.09741
 12    3:4:5    0.48707
 13    4:4:4    0.12684
                =======
                1.00000

 

View solution in original post

14 REPLIES 14
FreelanceReinh
Jade | Level 19

Hello @Ksharp,

 

I think the key point here is that the outcome "3:4:5" is actually "3:4:5 or 3:5:4 or 4:3:5 or 4:5:3 or 5:3:4 or 5:4:3". While each of these six events is less likely to occur than "4:4:4" -- which is the most probable combination (so your intuition is right) --, the sum of those six probabilities (hence the probability of obtaining a 3:4:5 ratio, regardless of which colors correspond to "3", "4" and "5"), 6*0.0811787...=0.487072..., is much greater than the probability of the single event "4:4:4", 0.126841....

 

The probabilities can be computed easily from the multivariate hypergeometric distribution:

 

%let K1=8;
%let K2=8;
%let K3=8;
%let n=12;

data prob(drop=i j);
length ratio $11;
do k1=0 to &K1;
  do k2=0 to &K2;
    do k3=0 to &K3;
      if sum(of k:)=&n then do;
        i=min(of k:);
        j=max(of k:);
        ratio=catx(':',i,&n-i-j,j);
        p=comb(&K1,k1)*comb(&K2,k2)*comb(&K3,k3)/comb(&K1+&K2+&K3,&n);
        output;
      end;
    end;
  end;
end;
run;

proc summary data=prob nway;
class ratio;
var p;
output out=want(drop=_:) sum=;
run;

proc print data=want;
sum p;
run;

Result:

Obs    ratio       p

  1    0:4:8    0.00016
  2    0:5:7    0.00099
  3    0:6:6    0.00087
  4    1:3:8    0.00099
  5    1:4:7    0.00994
  6    1:5:6    0.02783
  7    2:2:8    0.00087
  8    2:3:7    0.02783
  9    2:4:6    0.12177
 10    2:5:5    0.09741
 11    3:3:6    0.09741
 12    3:4:5    0.48707
 13    4:4:4    0.12684
                =======
                1.00000

 

Ksharp
Super User
Great! It is awesome !
yabwon
Onyx | Level 15

@FreelanceReinh 

Looks like we went the same path 😄

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



quickbluefish
Barite | Level 11
This is interesting. I guess it's because each time you pull a ball of color X, you're changing the probability you'll pull another of X. So the 2nd pull is slightly more likely to be Y or Z and the third pull is more likely to be Z? If there were infinite balls, then I guess you would in fact get 4:4:4. One thing, though - it looks like the magician would have lost money since the probability of 3:4:5 is slightly less then 50%.
Ksharp
Super User
Yeah.
As a matter of fact , magician have a list to claim:
3:4:5 get -10
4:4:4 get 0
2:4:6 get 10
...........
yabwon
Onyx | Level 15

Just to be clear: probability is not my super-power but:

 

Ain't that case of multivariate hypergeometric distribution?

https://en.wikipedia.org/wiki/Hypergeometric_distribution#Multivariate_hypergeometric_distribution

and the fact that:

P(3,4,5) is independent from P(3,5,4), etc,

P(3,4,5 or 3,5,4) = P(3,4,5) + P(3,5,4) - P(3,4,5 and 3,5,4) = P(3,4,5) + P(3,5,4)

so probability P("I have 3:4:5 sequence") would be just sum of P(3,4,5) + P(3,5,4) + ...+ P(5,4,3).

 

 


data have;
input grp $ r g b;
cards;
a 4 4 4
b 3 4 5
b 3 5 4
b 4 3 5
b 4 5 3
b 5 3 4
b 5 4 3
;
run;

/*
a = get all 4
b = get 3:4:5
*/

data want;
rn=8;
gn=8;
bn=8;
N=rn+gn+bn;

set have;
k=r+g+b;

p = /* https://en.wikipedia.org/wiki/Hypergeometric_distribution#Multivariate_hypergeometric_distribution */
(comb(rn, r)
*comb(gn, g)
*comb(bn, b)
)
/comb(n, k)
;

put (_ALL_) (=/);

if _N_=2 then sum=0;
sum+p;
run;
proc print;
run;

 

[EDIT:] proc print:

yabwon_0-1743501007026.png

 

 

 

Bart

 

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Rick_SAS
SAS Super FREQ

The key is that you are pulling WITHOUT REPLACEMENT, so the conditional probabilities change as you pull. The expected ratio is only 4:4:4 is you pull with replacement. You can do a simpler analysis of 6 balls (3R, 3B, 3Y) and pulling 3 balls to see that the ratios are not even. If I did the analysis correctly, in that situation there is a 70% chance that the ratio is 2:1:0 and only a 30% chance that the ratio is 1:1:1.

 

I think you can perform the simulation more concisely if you use the SAMPLE function to pull the samples and use the TABULATE subroutine to analyze the ratio:

proc iml;
call randseed(123);
/* simulate 1000000 times */
n=1000000; 
/* create binary variable equal to 1 if the ratio is 3:4:5 */
is_345 = j(n,1,0);

set = repeat('R',8) // repeat('G',8) // repeat('Y',8);
draws = sample(set, 12 //n, "NoReplace");  /* draw n samples of size 12 */

do i = 1 to n;
   call tabulate(labels, freq, draws[i,]);
   if ncol(freq)=3 then 
      is_345[i] = all(freq={3 4 5}) | all(freq={3 5 4}) | 
                  all(freq={4 3 5}) | all(freq={5 3 4}) | 
                  all(freq={4 5 3}) | all(freq={5 4 3}) ;
end;
/* estimate proportion of draws that result in 3:4:5 */
prop_345 = mean(is_345);
print prop_345;

In this simulation, the result is 0.4867 of the draws have ratio 3:4:5.  Since this is less than 50%, the magician must have been using "magic" (that is, cheating) to win money.

 

yabwon
Onyx | Level 15

If someone doesn't have IML, just for fun, 4GL version (there is a bunch of code commented out from my try-and-errors for optimisation, for those who like to test):  

 

/* arrays simulation */
options mprint;
data _null_;
  array bwb[24] _temporary_ (8*(1 2 3)); /* template box with balls */
  array b[24]   _temporary_;             /* box from which we are selecting */
  array r[3] ;                           /* couter for selected color 1, 2, 3 */
  call streaminit(42);                   /* set seed = 42 */

  array x[0:12,0:12,0:12] _temporary_; /* array for results, 0=color was never selected */

  do iter=0 to 1e6-1; /* iterations */

  /* populate b with balls */
  /*
    do n=1 to 24;
      b[n]=bwb[n];
    end;
  *//*
   call pokelong (
      put (PEEKCLONG (ADDRLONG(bwb[1]), 192), $192.)
    , ADDRLONG(b[1])
    , 192);
  *//**/
    %macro loop(n);
      %do n=1 %to &n.;
        b[&n.]=bwb[&n.];
      %end;
    %mend loop;
    %loop(24)
  /**/

    r1=0;r2=0;r3=0; /* initiate counters */
    /*
    do j=24 to 13 by -1;       
      i = RAND("integer",1,j); 
      r[b[i]]+1;               
      b[i]=b[j];               
    end;
    */
    %macro loop2(n,m);
    %do n=&n. %to %eval(&n.-&m.+1) %by -1;    /* select 12 from 24 with no repetitions */
      i = RAND("integer",1,&n.); /* get random number from 1-24, 1-23, 1-22, etc. */
      r[b[i]]+1;                 /* increase count of selected ball */
      b[i]=b[&n.];               /* fill gap in "i" (after selected ball) with last ball ("n") */
    %end;
    %mend loop2;
    %loop2(24,12)

    call sortn(of r[*]); /* order counters */

    x[r1,r2,r3]+1;       /* increase selected "set" */
  end;

  /* output generated data */
  do r1=0 to 12;
    do r2=0 to 12;
      do r3=0 to 12;
        if x[r1,r2,r3] then /* print only "materialized" cases */
          do;
            n=x[r1,r2,r3]/iter; 
            output;
            put @2 r1= @12 r2= @22 r3= @32 n= best32.20;
          end;
      end;
    end;
  end;
  keep r: n;
run;

Result for seed=42 and 1e9 iterations:

 

/* 1e9

 r1=0      r2=4      r3=8      n=0.000155304
 r1=0      r2=5      r3=7      n=0.000993489
 r1=0      r2=6      r3=6      n=0.000869773
 r1=1      r2=3      r3=8      n=0.000994072
 r1=1      r2=4      r3=7      n=0.009935598
 r1=1      r2=5      r3=6      n=0.027825111
 r1=2      r2=2      r3=8      n=0.000869628
 r1=2      r2=3      r3=7      n=0.027822129
 r1=2      r2=4      r3=6      n=0.121775441
 r1=2      r2=5      r3=5      n=0.097433405
 r1=3      r2=3      r3=6      n=0.097416671
 r1=3      r2=4      r3=5      n=0.487068081
 r1=4      r2=4      r3=4      n=0.126841298
NOTE: The data set WORK.FREQCOUNT has 13 observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           11:56.18
      cpu time            11:56.14

*/

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



PaigeMiller
Diamond | Level 26

So looking at the computed probability for 3:4:5 which is 0.487, in the long run the magician should lose money looking for 3:4:5, as the rest of the possibilities add to greater than 0.5 and so are more likely.

--
Paige Miller
Ksharp
Super User
Yeah.
As a matter of fact , magician have a list to claim to make sure he can earn the money . Here I just simplify this story to have some fun.
3:4:5 get -10
4:4:4 get 0
2:4:6 get 10
...........
Ksharp
Super User
Rick,
Good. Using SAMPLE() funtion is more like reality than my RAND().
Actually, magician have a list to claim to make sure he can earn the money. I just want to simplify this story . and can't image 3:4:5 would have so big probability.
Rick_SAS
SAS Super FREQ

Right. You can see this effect even for two classes. and even for sampling with replacement. Flip a fair coin 4 times and look at the ratios of heads to tails.

  • The probability of 0:4 is 2 / 16 because HHHH and TTTT are possible.
  • The probability of 1:3 is very high at 8 / 16. It includes possibilities such a HTTT and THTT.
  • The probability of 2:2 is "only" 6 / 16. It includes possibilities such a HTTH and THHT.

So, even in that simple probability space, the unequal ratio is higher than you might initially think.

Ksharp
Super User

Here I would like to use PROC SURVEYSELECT to mimic SAMPLE() funtion in SAS/IML for sombody was not familiar with IML syntax.

 

data have;
do i=1 to 8;
 ball='Y';output;
 ball='B';output;
 ball='R';output;
end;
drop i;
run;

proc surveyselect data=have out=temp seed=123   sampsize=12 reps=1000000;
run;
proc freq data=temp noprint;
table Replicate*ball/out=temp2 sparse list;
run;
proc sort data=temp2;
by Replicate count;
run;
data want;
length flag $ 8;
do until(last.Replicate);
 set temp2;
 by Replicate;
 flag=catx(':',flag,count);
end;
keep Replicate flag;
run;
proc freq data=want order=freq;
table flag/plots=freqplot(orient=horizontal scale=percent);
run;

Ksharp_0-1743668675799.png

Ksharp_1-1743668690054.png

 

Astounding
PROC Star

Not the same thing but, it's my favorite "audience bet".  

Playig backgammon, and rolling two dice, rolling "doubles" is advantageious (1:1, 2:2, ... 6:6).

Chances of doubles on a single roll is 1 out of 6.  I claim that my opponent is exceedingly lucky and if he rolls the dice only 5 times, he will roll doubles at some point.

The math is much easier than the question with three colors of balls.

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 14 replies
  • 2864 views
  • 22 likes
  • 7 in conversation