Hi,
I would like to simulate two events A and B knowing their conditional probabilities as follow
P(A) =0.2
P(B)=0.3
P(A and B) =0.1
I remember that P(A or B) = P(A) + P(B) - P(A and B)
How do I incorporate the probability of the joint event (A and B) in the simulation?
Thank you
Sorry @Astounding, your code is fine. I missed it because I was studying @mkeintz's simulation from the joint probability.
My preference would be to modify Astounding's code slightly to use RAND, but the logic is the same:
data Cond;
do i = 1 to 1000;
A = rand("Bernoulli", 0.2);
if A then B = rand("Bernoulli", 0.5);
else B = rand("Bernoulli", 0.25);
output;
end;
proc freq data=Cond; /* check empirical distribution */
table A B A*B;
run;
You don't. You just assign A and B randomly, and the distribution should work itself out. For example:
data want;
set have;
A_flag = (ranuni(12345) < 0.2);
B_flag = (ranuni(67893) < 0.3);
run;
Since the flags are assigned randomly, their joint distribution should match your expectations. And you can change the initial seeds if you need additional simulations. To check the distributions:
proc freq data=want;
tables a_flag * b_flag / list;
run;
Your question is ambiguous.
Give an example to explain your data simulation question .
Your formula is not exactly relevant to the task your describe. From the infor you provided, you want to determine the probabiliy of (1) A only, (2) B only, (3) neither.
Given P(A)=.2, P(B)=.3, and P(A&B)=.1, you know how to fill out a 2*2 crosstab (A=0 or 1 by B=0 or 1). Let's say you have 100 observations then your table has the following starting points, given the probabilities you cite:
| A=0 A=1 | total
------------------------------------
B=0 | |
B=1 | 10 | 30
-----------------------------------
total | 20 | 100
So you can conclude that
P(A=1,B=0) = P(A) - P(A&B) = .2 - .1 = .1 (N=10)
P(B=0,B=1) = P(B) - P(A&B) = .3 - .1 = .2 (N=20)
P(A=0,B=0) = 1 - P(A&B) - P(A=1,B=0) - P(A=0,B=1) = 1 - .1 - .1 - .2 = .6
generating this table with the italicized values calculated per above
| A=0 A=1 | total
------------------------------------
B=0 | 60 10 | 70
B=1 | 20 10 | 30
-----------------------------------
total | 80 20 | 100
You can now map the range of 0-1 to the above 4 cell, as in
[0.0-0.6] ==> A=0,B=0
(0.6-0.7] ==> A=1,B=0
(0.7-0.9] ==> A=0,B=1
(0.9-1.0] ==> A=1,B=1
This program uses that non-independent table to generate 100 observations using the association you describe:
data t;
retain P_AandB .1 P_A .2 P_B .3
P_Aonly P_Bonly P_neither . ;
if _N_=1 then do;
P_Aonly =P_A - P_AandB;
P_Bonly =P_B - P_AandB;
P_neither = 1 - P_AandB - P_Aonly - P_Bonly;
put (p_:) (=);
end;
samp_size=100;
do N=1 to samp_size;
A=0; B=0;
u=ranuni(10598156);
if U <= P_neither then;
else if U >P_neither + P_Aonly + P_Bonly then do; A=1; B=1; end;
else if U <= P_neither + P_Aonly then A=1;
else B=1;
output;
end;
run;
proc freq; table A*B;run;
I'm going to amend my earlier answer. Although much of the problem is still not clear (do you even have a data set to work with?), the basic principles and math apply in any case.
If pA=0.2 and pB=0.3, their joint A&B distribution would be 0.06 if selected randomly. Therefore, to raise the hit rate to 0.1 from 0.06, pB needs to be higher when A=1. Since pB is fixed at 0.3, however, pB needs to be lower when A=0. The math is actually pretty easy ... you can see the probabilities that I used here:
data want;
set have;
do replication = 1 to 10;
A_flag = (ranuni(12345) < 0.2);
if A_flag=1 then B_flag = (ranuni(67893) < 0.5);
else B_flag = (ranuni(97531) < 0.25);
output;
end;
run;
If you have an original data set (and it's not clear that you do), this gives you 10 versions of the data set with different A and B values, where each version conforms to the conditions you specified. "Conforms" means on average ... each replication may be slightly off.
Astounding,
A comment to your second answer, using 3 different calls of RANUNI with 3 different seeds:
It is only possible to set the seed for the RANUNI (and other functions in the same family) once in each datastep. The seed is set in the first call to RANUNI, and the compiler disregards the parameters for the later calls. You can try for yourself:
data test; do _N_=1 to 10; a=ranuni(12345); b=ranuni(45678); output; end; run; data test2; set test; a2=ranuni(12345); b2=ranuni(4); run;
With this program you will see that not only are all the A2 values equal to the A values, the B2 values are allso all equal to the B values. On the other hand, if you switch the two lines in the final datastep:
data test2; set test; b2=ranuni(4); a2=ranuni(12345); run;
You will see that not only are all the B2 values different from the B values, the A2 values differ from the A values as well.
Regards,
Søren
For more information about Soren's response. see "Random number seeds: Only the first seed matters."
Thanks. You learn something new every day (at least on a good day).
Regarding my solution, it should work regardless ... as long as the random number generated changes each time RANUNI is called.
Regarding the original problem, it's still not clear whether it is a programming problem or an algebra problem (why did I use 0.5 and 0.25, for example). We'll have to wait to find out what is really needed here.
The title of this post uses the term "conditional probability," but your question (which needs to be clarified) does not use the conditional probability. If you are interested in conditional probabilities, use
P(B|A) = P(A and B) / P(A) = 0.1 / 0.2 = 0.5
But if the simulation process is "draw A, then draw B conditional on A," you need knowledge of P(B| ^A).
You can review the rules of probability at
http://stattrek.com/probability/probability-rules.aspx?Tutorial=AP
and the rules of conditional probability at
https://en.wikipedia.org/wiki/Conditional_probability
Rick, while I agree, I thought it was reasonable to back into the p(B|^A) by looking at the condition that the overall p(B) was 0.3. That's how I came up with the 0.25. If I'm proven wrong, it won't be the first time.
Sorry @Astounding, your code is fine. I missed it because I was studying @mkeintz's simulation from the joint probability.
My preference would be to modify Astounding's code slightly to use RAND, but the logic is the same:
data Cond;
do i = 1 to 1000;
A = rand("Bernoulli", 0.2);
if A then B = rand("Bernoulli", 0.5);
else B = rand("Bernoulli", 0.25);
output;
end;
proc freq data=Cond; /* check empirical distribution */
table A B A*B;
run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.