BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SimRock
Obsidian | Level 7

Hi,

I would like to simulate two events A and B knowing their conditional probabilities as follow

P(A) =0.2

P(B)=0.3

P(A and B) =0.1

I remember that P(A or B) = P(A) + P(B) - P(A and B)

 

How do I incorporate the probability of the joint event (A and B) in the simulation?

 

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

Sorry @Astounding, your code is fine. I missed it because I was studying @mkeintz's simulation from the joint probability.

 

My preference would be to modify Astounding's code slightly to use RAND, but the logic is the same:

data Cond;
do i = 1 to 1000;
   A = rand("Bernoulli", 0.2);
   if A then   B = rand("Bernoulli", 0.5);
   else        B = rand("Bernoulli", 0.25);
   output;
end;

proc freq data=Cond; /* check empirical distribution */
table A B A*B;
run;

View solution in original post

10 REPLIES 10
Astounding
PROC Star

You don't.  You just assign A and B randomly, and the distribution should work itself out.  For example:

 

data want;

set have;

A_flag = (ranuni(12345) < 0.2);

B_flag = (ranuni(67893) < 0.3);

run;

 

Since the flags are assigned randomly, their joint distribution should match your expectations.  And you can change the initial seeds if you need additional simulations.  To check the distributions:

 

proc freq data=want;

tables a_flag * b_flag / list;

run;

Ksharp
Super User

Your question is ambiguous.

Give an example  to explain your data simulation question .

mkeintz
PROC Star

Your formula is not exactly relevant to the task your describe.  From the infor you provided, you want to determine the probabiliy of (1) A only, (2) B only, (3) neither.

 

Given P(A)=.2, P(B)=.3, and P(A&B)=.1, you know how to fill out a 2*2 crosstab (A=0 or 1 by B=0 or 1).  Let's say you have 100 observations then your table has the following starting points, given the probabilities you cite:

 

 

       |   A=0      A=1   |  total

------------------------------------

B=0    |                  |

B=1    |             10   |    30

-----------------------------------

total  |             20   |   100

 

 

So you can conclude that

   P(A=1,B=0) = P(A) - P(A&B) = .2 - .1 = .1     (N=10)
   P(B=0,B=1) = P(B) - P(A&B) = .3 - .1 = .2     (N=20)

   P(A=0,B=0)  = 1 - P(A&B) - P(A=1,B=0) - P(A=0,B=1) = 1 - .1 - .1 - .2 = .6

 

generating this table with the italicized values calculated per above

 

       |   A=0      A=1   |  total

------------------------------------

B=0    |   60        10   |    70

B=1    |   20        10   |    30

-----------------------------------

total  |   80        20   |   100

 

 

You can now map the range of 0-1 to the above 4 cell, as in

   [0.0-0.6]  ==> A=0,B=0       
   (0.6-0.7]  ==> A=1,B=0

   (0.7-0.9]  ==> A=0,B=1

   (0.9-1.0] ==>  A=1,B=1

 

This program uses that non-independent table to generate 100 observations using the association you describe:

 

data t;
  retain P_AandB .1  P_A .2  P_B .3
         P_Aonly P_Bonly P_neither . ;

  if _N_=1 then do;
    P_Aonly =P_A - P_AandB;
    P_Bonly =P_B - P_AandB;
    P_neither  = 1 - P_AandB - P_Aonly - P_Bonly;	
	put (p_:) (=);
  end;

  samp_size=100;

  do N=1 to samp_size;
    A=0; B=0;
    u=ranuni(10598156);
	if      U <= P_neither then;
	else if U >P_neither + P_Aonly + P_Bonly then do; A=1; B=1; end;
	else if U <= P_neither + P_Aonly then A=1;
	else B=1;
	output;
  end;
run;

proc freq; table A*B;run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Astounding
PROC Star

I'm going to amend my earlier answer.  Although much of the problem is still not clear (do you even have a data set to work with?), the basic principles and math apply in any case.

 

If pA=0.2 and pB=0.3, their joint A&B distribution would be 0.06 if selected randomly.  Therefore, to raise the hit rate to 0.1 from 0.06, pB needs to be higher when A=1.  Since pB is fixed at 0.3, however, pB needs to be lower when A=0.  The math is actually pretty easy ... you can see the probabilities that I used here:

 

data want;

set have;

do replication = 1 to 10;

   A_flag = (ranuni(12345) < 0.2);

   if A_flag=1 then B_flag = (ranuni(67893) < 0.5);

   else B_flag = (ranuni(97531) < 0.25);

   output;

end;

run;

 

If you have an original data set (and it's not clear that you do), this gives you 10 versions of the data set with different A and B values, where each version conforms to the conditions you specified.  "Conforms" means on average ... each replication may be slightly off.

s_lassen
Meteorite | Level 14

Astounding,

A comment to your second answer, using 3 different calls of RANUNI with 3 different seeds:

It is only possible to set the seed for the RANUNI (and other functions in the same family) once in each datastep. The seed is set in the first call to RANUNI, and the compiler disregards the parameters for the later calls. You can try for yourself:

data test;
  do _N_=1 to 10;
  a=ranuni(12345);
  b=ranuni(45678);
  output;
  end;
run;
data test2;
  set test;
  a2=ranuni(12345);
  b2=ranuni(4);
run;

With this program you will see that not only are all the A2 values equal to the A values, the B2 values are allso all equal to the B values. On the other hand, if you switch the two lines in the final datastep:

data test2;
  set test;
  b2=ranuni(4);
  a2=ranuni(12345);
run;

You will see that not only are all the B2 values different from the B values, the A2 values differ from the A values as well.

 

Regards,

Søren

 

 

Rick_SAS
SAS Super FREQ

For more information about Soren's response. see "Random number seeds: Only the first seed matters."

Astounding
PROC Star

Thanks. You learn something new every day (at least on a good day). 

 

Regarding my solution, it should work regardless ... as long as the random number generated changes each time RANUNI is called.  

 

Regarding the original problem, it's still not clear whether it is a programming problem or an algebra problem (why did I use 0.5 and 0.25, for example).  We'll have to wait to find out what is really needed here.

Rick_SAS
SAS Super FREQ

 

The title of this post uses the term "conditional probability," but your question (which needs to be clarified) does not use the conditional probability. If you are interested in conditional probabilities, use 

P(B|A) = P(A and B) / P(A) = 0.1 / 0.2 = 0.5

 

But if the simulation process is "draw A, then draw B conditional on A," you need knowledge of P(B| ^A).

 

You can review the rules of probability at

http://stattrek.com/probability/probability-rules.aspx?Tutorial=AP

and the rules of conditional probability at

https://en.wikipedia.org/wiki/Conditional_probability

 

Astounding
PROC Star

Rick, while I agree, I thought it was reasonable to back into the p(B|^A) by looking at the condition that the overall p(B) was 0.3.  That's how I came up with the 0.25.  If I'm proven wrong, it won't be the first time.

Rick_SAS
SAS Super FREQ

Sorry @Astounding, your code is fine. I missed it because I was studying @mkeintz's simulation from the joint probability.

 

My preference would be to modify Astounding's code slightly to use RAND, but the logic is the same:

data Cond;
do i = 1 to 1000;
   A = rand("Bernoulli", 0.2);
   if A then   B = rand("Bernoulli", 0.5);
   else        B = rand("Bernoulli", 0.25);
   output;
end;

proc freq data=Cond; /* check empirical distribution */
table A B A*B;
run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 3518 views
  • 11 likes
  • 6 in conversation