Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- Programming
- /
- Re: Joint probabilities

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 10-12-2010 07:09 AM
(2172 views)

I was wondering if anyone knows a solution to the following problem.

I've got an example dataset which can be viewed in two ways, it really depends on how best to view it for solving my problem.

The first version is:

PatientID Condition Present

1 Cough 1

1 Headache 1

1 Vomiting 0

2 Cough 0

2 Headache 1

2 Vomiting 1

3 Cough 0

3 Headache 0

3 Vomiting 0

4 Cough 1

4 Headache 1

4 Vomiting 0

The second version:

Patient ID Cough Headache Vomiting

1 1 1 0

2 0 1 0

3 0 0 0

4 1 1 0

I would like to find the probability of a) having each of the conditions and b) the joint probability of having say cough and headache. I can find a) fairly simply but I'm having problems figuring out how best to find b) and which version of the example dataset I should use. If I have much larger version of my example datasets, I would like to be able to write a program which can move through each of the conditions to find the joint probability. Any suggestions would be most appreciated.

I've got an example dataset which can be viewed in two ways, it really depends on how best to view it for solving my problem.

The first version is:

PatientID Condition Present

1 Cough 1

1 Headache 1

1 Vomiting 0

2 Cough 0

2 Headache 1

2 Vomiting 1

3 Cough 0

3 Headache 0

3 Vomiting 0

4 Cough 1

4 Headache 1

4 Vomiting 0

The second version:

Patient ID Cough Headache Vomiting

1 1 1 0

2 0 1 0

3 0 0 0

4 1 1 0

I would like to find the probability of a) having each of the conditions and b) the joint probability of having say cough and headache. I can find a) fairly simply but I'm having problems figuring out how best to find b) and which version of the example dataset I should use. If I have much larger version of my example datasets, I would like to be able to write a program which can move through each of the conditions to find the joint probability. Any suggestions would be most appreciated.

4 REPLIES 4

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Just to make sure that I understand, do you want to estimate the joint probability from the given data? In the example you gave, for example, the estimated probability of having a cough and headache would be 2/4=0.5, right?

If I correctly interpreted your problem, I think that the easiest method would use version 2 of your dataset. Assuming that the variables cough and headache are both 0/1 indicator variables and there is exactly one observation for each patient, the following code should work:

[pre]

data test;

input id $ cough headache vomiting;

datalines;

1 1 1 0

2 0 1 0

3 0 0 0

4 1 1 0

;

run;

proc sql;

select sum(cough and headache)/count(*) as prob_cough_and_headache

from test;

quit;

[/pre]

If I correctly interpreted your problem, I think that the easiest method would use version 2 of your dataset. Assuming that the variables cough and headache are both 0/1 indicator variables and there is exactly one observation for each patient, the following code should work:

[pre]

data test;

input id $ cough headache vomiting;

datalines;

1 1 1 0

2 0 1 0

3 0 0 0

4 1 1 0

;

run;

proc sql;

select sum(cough and headache)/count(*) as prob_cough_and_headache

from test;

quit;

[/pre]

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

There might be a better, more efficient way of doing this, but here is my initial shot at a program which moves through all the combinations to find joint probabilities:

[pre]

data test;

input id $ cough headache vomiting;

datalines;

1 1 1 0

2 0 1 0

3 0 0 0

4 1 1 0

;

run;

proc sql noprint;

select distinct name into: conditions separated by '" "'

from sashelp.vcolumn

where libname = 'WORK' and memname = 'TEST' and upcase(name) ne 'ID';

select count(*) into: num_conditions

from sashelp.vcolumn

where libname = 'WORK' and memname = 'TEST' and upcase(name) ne 'ID';

quit;

data temp(drop=n ncomb j rc);

array condition[&num_conditions] $20 ("&conditions");

n=dim(condition);

do k=1 to dim(condition);

ncomb=comb(n,k);

do j=1 to ncomb+1;

rc=lexcomb(j, k, of condition);

if rc<0 then leave;

output;

end;

end;

run;

data _null_;

length combinations $ 200;

set temp end=last;

array condition{*} condition:;

do j=1 to dim(condition);

if j>k then condition{j} = ' ';

end;

combinations=catx(' and ', of condition:);

i+1;

call symput(cats("condition",i), combinations);

if last then call symput('nobs', i);

run;

data probabilities;

length conditions $ 200 prob 8;

if 0 then output;;

run;

%macro probabilities;

proc sql;

%do i=1 %to &nobs;

insert into probabilities(conditions, prob)

select "&&&condition&i", sum(&&&condition&i)/count(*)

from test;

%end;

quit;

%mend;

%probabilities

proc print data=probabilities;

run;

[/pre]

Message was edited by: polingjw

Modified array declaration in the data _null_ step. Previously used condition1-condition3.

[pre]

data test;

input id $ cough headache vomiting;

datalines;

1 1 1 0

2 0 1 0

3 0 0 0

4 1 1 0

;

run;

proc sql noprint;

select distinct name into: conditions separated by '" "'

from sashelp.vcolumn

where libname = 'WORK' and memname = 'TEST' and upcase(name) ne 'ID';

select count(*) into: num_conditions

from sashelp.vcolumn

where libname = 'WORK' and memname = 'TEST' and upcase(name) ne 'ID';

quit;

data temp(drop=n ncomb j rc);

array condition[&num_conditions] $20 ("&conditions");

n=dim(condition);

do k=1 to dim(condition);

ncomb=comb(n,k);

do j=1 to ncomb+1;

rc=lexcomb(j, k, of condition

if rc<0 then leave;

output;

end;

end;

run;

data _null_;

length combinations $ 200;

set temp end=last;

array condition{*} condition:;

do j=1 to dim(condition);

if j>k then condition{j} = ' ';

end;

combinations=catx(' and ', of condition:);

i+1;

call symput(cats("condition",i), combinations);

if last then call symput('nobs', i);

run;

data probabilities;

length conditions $ 200 prob 8;

if 0 then output;;

run;

%macro probabilities;

proc sql;

%do i=1 %to &nobs;

insert into probabilities(conditions, prob)

select "&&&condition&i", sum(&&&condition&i)/count(*)

from test;

%end;

quit;

%mend;

%probabilities

proc print data=probabilities;

run;

[/pre]

Message was edited by: polingjw

Modified array declaration in the data _null_ step. Previously used condition1-condition3.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

To polingjw, in response to your first reply, you are correct in thinking that the joint probability for cought and headache is 1/2. Many thanks for both your suggestions, I'm going to try both. The latter might be more appropriate as I want to be able to find joint probabilities for much larger datasets and need an efficient way doing this.
This worked very well. However, I have some further questions. Ideally I would like to separate the joint probabilites from the probabilities for each condition as I would like to then calculate the conditional probability. I have been trying in vain to adapt the code which polingjw wrote to do this but to no avail. Any advice would be much appreciated.

Message was edited by: den

Message was edited by: den

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I don’t think that I completely understand what you’re trying to do. Specifically, what conditional probabilities are you trying to estimate? In general, the conditional events could be placed into the proc sql statement in a where clause. Here is an example of a macro that might be used to estimate some conditional probabilities:

[pre]

data test;

input id $ cough headache vomiting fever;

datalines;

1 1 0 1 1

2 0 1 0 0

3 0 0 0 1

4 1 1 0 1

;

run;

%macro findprob(event, condition);

proc sql noprint;

select avg(&event) into:prob

from test

%if %bquote(&condition) ne %then where &condition;

;

quit;

%if %bquote(&condition) ne %then %let condition = %str( )given &condition;

%let prob=&prob;

%put The probability of &event&condition is &prob..;

%mend;

options nonotes;

%findprob(cough)

%findprob(cough, vomiting)

%findprob(cough and headache)

%findprob(cough, headache and fever)

%findprob(cough and fever, not vomiting)

%findprob(fever, cough or headache)

[/pre]

The following appears on the log when the program is run:

[pre]

396 %findprob(cough)

The probability of cough is 0.5.

397 %findprob(cough, vomiting)

The probability of cough given vomiting is 1.

398 %findprob(cough and headache)

The probability of cough and headache is 0.25.

399 %findprob(cough, headache and fever)

The probability of cough given headache and fever is 1.

400 %findprob(cough and fever, not vomiting)

The probability of cough and fever given not vomiting is 0.333333.

401 %findprob(fever, cough or headache)

The probability of fever given cough or headache is 0.666667.

[/pre]

[pre]

data test;

input id $ cough headache vomiting fever;

datalines;

1 1 0 1 1

2 0 1 0 0

3 0 0 0 1

4 1 1 0 1

;

run;

%macro findprob(event, condition);

proc sql noprint;

select avg(&event) into:prob

from test

%if %bquote(&condition) ne %then where &condition;

;

quit;

%if %bquote(&condition) ne %then %let condition = %str( )given &condition;

%let prob=&prob;

%put The probability of &event&condition is &prob..;

%mend;

options nonotes;

%findprob(cough)

%findprob(cough, vomiting)

%findprob(cough and headache)

%findprob(cough, headache and fever)

%findprob(cough and fever, not vomiting)

%findprob(fever, cough or headache)

[/pre]

The following appears on the log when the program is run:

[pre]

396 %findprob(cough)

The probability of cough is 0.5.

397 %findprob(cough, vomiting)

The probability of cough given vomiting is 1.

398 %findprob(cough and headache)

The probability of cough and headache is 0.25.

399 %findprob(cough, headache and fever)

The probability of cough given headache and fever is 1.

400 %findprob(cough and fever, not vomiting)

The probability of cough and fever given not vomiting is 0.333333.

401 %findprob(fever, cough or headache)

The probability of fever given cough or headache is 0.666667.

[/pre]

**Available on demand!**

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.