## Joint probabilities

Occasional Contributor
Posts: 16

# Joint probabilities

I was wondering if anyone knows a solution to the following problem.

I've got an example dataset which can be viewed in two ways, it really depends on how best to view it for solving my problem.

The first version is:

PatientID Condition Present
1 Cough 1
1 Vomiting 0
2 Cough 0
2 Vomiting 1
3 Cough 0
3 Vomiting 0
4 Cough 1
4 Vomiting 0

The second version:

1 1 1 0
2 0 1 0
3 0 0 0
4 1 1 0

I would like to find the probability of a) having each of the conditions and b) the joint probability of having say cough and headache. I can find a) fairly simply but I'm having problems figuring out how best to find b) and which version of the example dataset I should use. If I have much larger version of my example datasets, I would like to be able to write a program which can move through each of the conditions to find the joint probability. Any suggestions would be most appreciated.
Regular Contributor
Posts: 171

## Re: Joint probabilities

Just to make sure that I understand, do you want to estimate the joint probability from the given data? In the example you gave, for example, the estimated probability of having a cough and headache would be 2/4=0.5, right?
If I correctly interpreted your problem, I think that the easiest method would use version 2 of your dataset. Assuming that the variables cough and headache are both 0/1 indicator variables and there is exactly one observation for each patient, the following code should work:
[pre]
data test;
input id \$ cough headache vomiting;
datalines;
1 1 1 0
2 0 1 0
3 0 0 0
4 1 1 0
;
run;

proc sql;
from test;
quit;
[/pre]
Regular Contributor
Posts: 171

## Re: Joint probabilities

There might be a better, more efficient way of doing this, but here is my initial shot at a program which moves through all the combinations to find joint probabilities:

[pre]
data test;
input id \$ cough headache vomiting;
datalines;
1 1 1 0
2 0 1 0
3 0 0 0
4 1 1 0
;
run;

proc sql noprint;
select distinct name into: conditions separated by '" "'
from sashelp.vcolumn
where libname = 'WORK' and memname = 'TEST' and upcase(name) ne 'ID';

select count(*) into: num_conditions
from sashelp.vcolumn
where libname = 'WORK' and memname = 'TEST' and upcase(name) ne 'ID';
quit;

data temp(drop=n ncomb j rc);
array condition[&num_conditions] \$20 ("&conditions");
n=dim(condition);
do k=1 to dim(condition);
ncomb=comb(n,k);
do j=1 to ncomb+1;
rc=lexcomb(j, k, of condition
• );
if rc<0 then leave;
output;
end;
end;
run;

data _null_;
length combinations \$ 200;
set temp end=last;
array condition{*} condition:;
do j=1 to dim(condition);
if j>k then condition{j} = ' ';
end;
combinations=catx(' and ', of condition;
i+1;
call symput(cats("condition",i), combinations);
if last then call symput('nobs', i);
run;

data probabilities;
length conditions \$ 200 prob 8;
if 0 then output;;
run;

%macro probabilities;
proc sql;
%do i=1 %to &nobs;
insert into probabilities(conditions, prob)
select "&&&condition&i", sum(&&&condition&i)/count(*)
from test;

%end;
quit;
%mend;
%probabilities

proc print data=probabilities;
run;
[/pre]

Message was edited by: polingjw

Modified array declaration in the data _null_ step. Previously used condition1-condition3.
• Occasional Contributor
Posts: 16

## Re: Joint probabilities

To polingjw, in response to your first reply, you are correct in thinking that the joint probability for cought and headache is 1/2. Many thanks for both your suggestions, I'm going to try both. The latter might be more appropriate as I want to be able to find joint probabilities for much larger datasets and need an efficient way doing this. This worked very well. However, I have some further questions. Ideally I would like to separate the joint probabilites from the probabilities for each condition as I would like to then calculate the conditional probability. I have been trying in vain to adapt the code which polingjw wrote to do this but to no avail. Any advice would be much appreciated.

Message was edited by: den
Regular Contributor
Posts: 171

## Re: Joint probabilities

I don’t think that I completely understand what you’re trying to do. Specifically, what conditional probabilities are you trying to estimate? In general, the conditional events could be placed into the proc sql statement in a where clause. Here is an example of a macro that might be used to estimate some conditional probabilities:

[pre]
data test;
input id \$ cough headache vomiting fever;
datalines;
1 1 0 1 1
2 0 1 0 0
3 0 0 0 1
4 1 1 0 1
;
run;

%macro findprob(event, condition);
proc sql noprint;
select avg(&event) intorob
from test
%if %bquote(&condition) ne %then where &condition;
;
quit;

%if %bquote(&condition) ne %then %let condition = %str( )given &condition;
%let prob=&prob;
%put The probability of &event&condition is &prob..;
%mend;

options nonotes;

%findprob(cough)
%findprob(cough, vomiting)
%findprob(cough and fever, not vomiting)
[/pre]

The following appears on the log when the program is run:

[pre]
396 %findprob(cough)
The probability of cough is 0.5.
397 %findprob(cough, vomiting)
The probability of cough given vomiting is 1.