BookmarkSubscribeRSS Feed
den
Calcite | Level 5 den
Calcite | Level 5
I was wondering if anyone knows a solution to the following problem.

I've got an example dataset which can be viewed in two ways, it really depends on how best to view it for solving my problem.

The first version is:

PatientID Condition Present
1 Cough 1
1 Headache 1
1 Vomiting 0
2 Cough 0
2 Headache 1
2 Vomiting 1
3 Cough 0
3 Headache 0
3 Vomiting 0
4 Cough 1
4 Headache 1
4 Vomiting 0

The second version:

Patient ID Cough Headache Vomiting
1 1 1 0
2 0 1 0
3 0 0 0
4 1 1 0

I would like to find the probability of a) having each of the conditions and b) the joint probability of having say cough and headache. I can find a) fairly simply but I'm having problems figuring out how best to find b) and which version of the example dataset I should use. If I have much larger version of my example datasets, I would like to be able to write a program which can move through each of the conditions to find the joint probability. Any suggestions would be most appreciated.
4 REPLIES 4
polingjw
Quartz | Level 8
Just to make sure that I understand, do you want to estimate the joint probability from the given data? In the example you gave, for example, the estimated probability of having a cough and headache would be 2/4=0.5, right?
If I correctly interpreted your problem, I think that the easiest method would use version 2 of your dataset. Assuming that the variables cough and headache are both 0/1 indicator variables and there is exactly one observation for each patient, the following code should work:
[pre]
data test;
input id $ cough headache vomiting;
datalines;
1 1 1 0
2 0 1 0
3 0 0 0
4 1 1 0
;
run;

proc sql;
select sum(cough and headache)/count(*) as prob_cough_and_headache
from test;
quit;
[/pre]
polingjw
Quartz | Level 8
There might be a better, more efficient way of doing this, but here is my initial shot at a program which moves through all the combinations to find joint probabilities:

[pre]
data test;
input id $ cough headache vomiting;
datalines;
1 1 1 0
2 0 1 0
3 0 0 0
4 1 1 0
;
run;

proc sql noprint;
select distinct name into: conditions separated by '" "'
from sashelp.vcolumn
where libname = 'WORK' and memname = 'TEST' and upcase(name) ne 'ID';

select count(*) into: num_conditions
from sashelp.vcolumn
where libname = 'WORK' and memname = 'TEST' and upcase(name) ne 'ID';
quit;

data temp(drop=n ncomb j rc);
array condition[&num_conditions] $20 ("&conditions");
n=dim(condition);
do k=1 to dim(condition);
ncomb=comb(n,k);
do j=1 to ncomb+1;
rc=lexcomb(j, k, of condition
  • );
    if rc<0 then leave;
    output;
    end;
    end;
    run;

    data _null_;
    length combinations $ 200;
    set temp end=last;
    array condition{*} condition:;
    do j=1 to dim(condition);
    if j>k then condition{j} = ' ';
    end;
    combinations=catx(' and ', of condition:);
    i+1;
    call symput(cats("condition",i), combinations);
    if last then call symput('nobs', i);
    run;

    data probabilities;
    length conditions $ 200 prob 8;
    if 0 then output;;
    run;

    %macro probabilities;
    proc sql;
    %do i=1 %to &nobs;
    insert into probabilities(conditions, prob)
    select "&&&condition&i", sum(&&&condition&i)/count(*)
    from test;

    %end;
    quit;
    %mend;
    %probabilities

    proc print data=probabilities;
    run;
    [/pre]

    Message was edited by: polingjw

    Modified array declaration in the data _null_ step. Previously used condition1-condition3.
  • den
    Calcite | Level 5 den
    Calcite | Level 5
    To polingjw, in response to your first reply, you are correct in thinking that the joint probability for cought and headache is 1/2. Many thanks for both your suggestions, I'm going to try both. The latter might be more appropriate as I want to be able to find joint probabilities for much larger datasets and need an efficient way doing this. This worked very well. However, I have some further questions. Ideally I would like to separate the joint probabilites from the probabilities for each condition as I would like to then calculate the conditional probability. I have been trying in vain to adapt the code which polingjw wrote to do this but to no avail. Any advice would be much appreciated.


    Message was edited by: den
    polingjw
    Quartz | Level 8
    I don’t think that I completely understand what you’re trying to do. Specifically, what conditional probabilities are you trying to estimate? In general, the conditional events could be placed into the proc sql statement in a where clause. Here is an example of a macro that might be used to estimate some conditional probabilities:

    [pre]
    data test;
    input id $ cough headache vomiting fever;
    datalines;
    1 1 0 1 1
    2 0 1 0 0
    3 0 0 0 1
    4 1 1 0 1
    ;
    run;

    %macro findprob(event, condition);
    proc sql noprint;
    select avg(&event) into:prob
    from test
    %if %bquote(&condition) ne %then where &condition;
    ;
    quit;

    %if %bquote(&condition) ne %then %let condition = %str( )given &condition;
    %let prob=&prob;
    %put The probability of &event&condition is &prob..;
    %mend;

    options nonotes;

    %findprob(cough)
    %findprob(cough, vomiting)
    %findprob(cough and headache)
    %findprob(cough, headache and fever)
    %findprob(cough and fever, not vomiting)
    %findprob(fever, cough or headache)
    [/pre]

    The following appears on the log when the program is run:

    [pre]
    396 %findprob(cough)
    The probability of cough is 0.5.
    397 %findprob(cough, vomiting)
    The probability of cough given vomiting is 1.
    398 %findprob(cough and headache)
    The probability of cough and headache is 0.25.
    399 %findprob(cough, headache and fever)
    The probability of cough given headache and fever is 1.
    400 %findprob(cough and fever, not vomiting)
    The probability of cough and fever given not vomiting is 0.333333.
    401 %findprob(fever, cough or headache)
    The probability of fever given cough or headache is 0.666667.
    [/pre]

    sas-innovate-2024.png

    Don't miss out on SAS Innovate - Register now for the FREE Livestream!

    Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

     

    Register now!

    How to Concatenate Values

    Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

    Find more tutorials on the SAS Users YouTube channel.

    Click image to register for webinarClick image to register for webinar

    Classroom Training Available!

    Select SAS Training centers are offering in-person courses. View upcoming courses for:

    View all other training opportunities.

    Discussion stats
    • 4 replies
    • 2145 views
    • 0 likes
    • 2 in conversation