DATA Step, Macro, Functions and more

Simulating conditional probabilities in SAS

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 14
Accepted Solution

Simulating conditional probabilities in SAS

Hi,

I would like to simulate two events A and B knowing their conditional probabilities as follow

P(A) =0.2

P(B)=0.3

P(A and B) =0.1

I remember that P(A or B) = P(A) + P(B) - P(A and B)

 

How do I incorporate the probability of the joint event (A and B) in the simulation?

 

Thank you


Accepted Solutions
Solution
‎06-28-2017 09:02 PM
SAS Super FREQ
Posts: 3,488

Re: Simulating conditional probabilities in SAS

Sorry @Astounding, your code is fine. I missed it because I was studying @mkeintz's simulation from the joint probability.

 

My preference would be to modify Astounding's code slightly to use RAND, but the logic is the same:

data Cond;
do i = 1 to 1000;
   A = rand("Bernoulli", 0.2);
   if A then   B = rand("Bernoulli", 0.5);
   else        B = rand("Bernoulli", 0.25);
   output;
end;

proc freq data=Cond; /* check empirical distribution */
table A B A*B;
run;

View solution in original post


All Replies
Super User
Posts: 5,099

Re: Simulating conditional probabilities in SAS

You don't.  You just assign A and B randomly, and the distribution should work itself out.  For example:

 

data want;

set have;

A_flag = (ranuni(12345) < 0.2);

B_flag = (ranuni(67893) < 0.3);

run;

 

Since the flags are assigned randomly, their joint distribution should match your expectations.  And you can change the initial seeds if you need additional simulations.  To check the distributions:

 

proc freq data=want;

tables a_flag * b_flag / list;

run;

Super User
Posts: 9,691

Re: Simulating conditional probabilities in SAS

Your question is ambiguous.

Give an example  to explain your data simulation question .

Valued Guide
Posts: 797

Re: Simulating conditional probabilities in SAS

Your formula is not exactly relevant to the task your describe.  From the infor you provided, you want to determine the probabiliy of (1) A only, (2) B only, (3) neither.

 

Given P(A)=.2, P(B)=.3, and P(A&B)=.1, you know how to fill out a 2*2 crosstab (A=0 or 1 by B=0 or 1).  Let's say you have 100 observations then your table has the following starting points, given the probabilities you cite:

 

 

       |   A=0      A=1   |  total

------------------------------------

B=0    |                  |

B=1    |             10   |    30

-----------------------------------

total  |             20   |   100

 

 

So you can conclude that

   P(A=1,B=0) = P(A) - P(A&B) = .2 - .1 = .1     (N=10)
   P(B=0,B=1) = P(B) - P(A&B) = .3 - .1 = .2     (N=20)

   P(A=0,B=0)  = 1 - P(A&B) - P(A=1,B=0) - P(A=0,B=1) = 1 - .1 - .1 - .2 = .6

 

generating this table with the italicized values calculated per above

 

       |   A=0      A=1   |  total

------------------------------------

B=0    |   60        10   |    70

B=1    |   20        10   |    30

-----------------------------------

total  |   80        20   |   100

 

 

You can now map the range of 0-1 to the above 4 cell, as in

   [0.0-0.6]  ==> A=0,B=0       
   (0.6-0.7]  ==> A=1,B=0

   (0.7-0.9]  ==> A=0,B=1

   (0.9-1.0] ==>  A=1,B=1

 

This program uses that non-independent table to generate 100 observations using the association you describe:

 

data t;
  retain P_AandB .1  P_A .2  P_B .3
         P_Aonly P_Bonly P_neither . ;

  if _N_=1 then do;
    P_Aonly =P_A - P_AandB;
    P_Bonly =P_B - P_AandB;
    P_neither  = 1 - P_AandB - P_Aonly - P_Bonly;	
	put (p_:) (=);
  end;

  samp_size=100;

  do N=1 to samp_size;
    A=0; B=0;
    u=ranuni(10598156);
	if      U <= P_neither then;
	else if U >P_neither + P_Aonly + P_Bonly then do; A=1; B=1; end;
	else if U <= P_neither + P_Aonly then A=1;
	else B=1;
	output;
  end;
run;

proc freq; table A*B;run;

 

 

Super User
Posts: 5,099

Re: Simulating conditional probabilities in SAS

I'm going to amend my earlier answer.  Although much of the problem is still not clear (do you even have a data set to work with?), the basic principles and math apply in any case.

 

If pA=0.2 and pB=0.3, their joint A&B distribution would be 0.06 if selected randomly.  Therefore, to raise the hit rate to 0.1 from 0.06, pB needs to be higher when A=1.  Since pB is fixed at 0.3, however, pB needs to be lower when A=0.  The math is actually pretty easy ... you can see the probabilities that I used here:

 

data want;

set have;

do replication = 1 to 10;

   A_flag = (ranuni(12345) < 0.2);

   if A_flag=1 then B_flag = (ranuni(67893) < 0.5);

   else B_flag = (ranuni(97531) < 0.25);

   output;

end;

run;

 

If you have an original data set (and it's not clear that you do), this gives you 10 versions of the data set with different A and B values, where each version conforms to the conditions you specified.  "Conforms" means on average ... each replication may be slightly off.

PROC Star
Posts: 63

Re: Simulating conditional probabilities in SAS

Astounding,

A comment to your second answer, using 3 different calls of RANUNI with 3 different seeds:

It is only possible to set the seed for the RANUNI (and other functions in the same family) once in each datastep. The seed is set in the first call to RANUNI, and the compiler disregards the parameters for the later calls. You can try for yourself:

data test;
  do _N_=1 to 10;
  a=ranuni(12345);
  b=ranuni(45678);
  output;
  end;
run;
data test2;
  set test;
  a2=ranuni(12345);
  b2=ranuni(4);
run;

With this program you will see that not only are all the A2 values equal to the A values, the B2 values are allso all equal to the B values. On the other hand, if you switch the two lines in the final datastep:

data test2;
  set test;
  b2=ranuni(4);
  a2=ranuni(12345);
run;

You will see that not only are all the B2 values different from the B values, the A2 values differ from the A values as well.

 

Regards,

Søren

 

 

SAS Super FREQ
Posts: 3,488

Re: Simulating conditional probabilities in SAS

For more information about Soren's response. see "Random number seeds: Only the first seed matters."

Super User
Posts: 5,099

Re: Simulating conditional probabilities in SAS

Thanks. You learn something new every day (at least on a good day). 

 

Regarding my solution, it should work regardless ... as long as the random number generated changes each time RANUNI is called.  

 

Regarding the original problem, it's still not clear whether it is a programming problem or an algebra problem (why did I use 0.5 and 0.25, for example).  We'll have to wait to find out what is really needed here.

SAS Super FREQ
Posts: 3,488

Re: Simulating conditional probabilities in SAS

 

The title of this post uses the term "conditional probability," but your question (which needs to be clarified) does not use the conditional probability. If you are interested in conditional probabilities, use 

P(B|A) = P(A and B) / P(A) = 0.1 / 0.2 = 0.5

 

But if the simulation process is "draw A, then draw B conditional on A," you need knowledge of P(B| ^A).

 

You can review the rules of probability at

http://stattrek.com/probability/probability-rules.aspx?Tutorial=AP

and the rules of conditional probability at

https://en.wikipedia.org/wiki/Conditional_probability

 

Super User
Posts: 5,099

Re: Simulating conditional probabilities in SAS

Rick, while I agree, I thought it was reasonable to back into the p(B|^A) by looking at the condition that the overall p(B) was 0.3.  That's how I came up with the 0.25.  If I'm proven wrong, it won't be the first time.

Solution
‎06-28-2017 09:02 PM
SAS Super FREQ
Posts: 3,488

Re: Simulating conditional probabilities in SAS

Sorry @Astounding, your code is fine. I missed it because I was studying @mkeintz's simulation from the joint probability.

 

My preference would be to modify Astounding's code slightly to use RAND, but the logic is the same:

data Cond;
do i = 1 to 1000;
   A = rand("Bernoulli", 0.2);
   if A then   B = rand("Bernoulli", 0.5);
   else        B = rand("Bernoulli", 0.25);
   output;
end;

proc freq data=Cond; /* check empirical distribution */
table A B A*B;
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 271 views
  • 11 likes
  • 6 in conversation