BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
afarrey11
Calcite | Level 5

Hi, my name is Andy. I'm a Master's of Public Health student and I'm in an Advanced Epi Methods class with math that I've never been taught, including the SAS programming.  My assignment is as follows:

p

"The probability of heart disease.

Use the Evans county data. Feel free to use the material provided in K and K that uses this data set (you will find it in the index).

For each person in the data set, determine the probability of developing coronary heart disease (CHD) as a function of the variables you choose to include in the analysis. Determine the mean probability of CHD as a function of the variables you have chosen.  Compare the mean probability to the cumulative probability.

The available variables are:

AGE   age

CAT   catecholamine status

CHD  coronary heart disease (the outcome)

CHL  cholesterol

DBP  diastolic blood pressure

SBP   systolic blood pressure

ECG  electrocardiogram

HPT  hypertension

SMK  smoking

ID is the identifier for the individual

CC  is the interaction of catecholamine and cholesterol

CH is the interaction of catecholamine and hypertension

(note:  these interactions are on the data set but you are not limited to them)"

I've looked at the code from Kleinbaum and Klein, and it is as follows:

Proc logistic data=epiiii.evans descending;

Model CHD = CAT AGE CHL ECG SMK HPT CH CC / COVB;

run;


PROC GENMOD DATA = epiiii.evans DESCENDING;

MODEL CHD = CAT AGE CHL ECG SMK HPT CH CC/LINK = LOGIT DIST = BINOMIAL;

ESTIMATE "OR (CHL = 220, HPT = 1)" CAT 1 CC 220 CH 1/EXP;

ESTIMATE "OR (CHL = 220, HPT = 0)" CAT 1 CC 220 CH 0/EXP;

CONTRAST "LRT for interaction terms" CH 1, CC 1;

RUN;

I haven't a clue how to, "Compare the mean probability to the cumulative probability."  I emailed my professor and he said:

"If you create a variable to hold the probability of the outcome, you can use the logistic model that

you built to calculate the probability of the outcome for each individual.  If you add these, you get the cumulative probability.  If you take the mean, you get the average probability.  You can then simply divide the number of outcomes by the number of subjects to get the cumulative probability as well (the equivalent of the intercept with no variables).  Compare these three numbers and say something about what they mean.  I think this is a whole lot easier than you think it is."

and

"You create a variable, call it Prob or something.  Make it equal to the formula for calculating the probability of the outcome for an individual: 1/(1- exp-(a +bx)).  Calculate it for each individual.  Then do a proc means on the variable Prob and ask for the sum and the mean.  That gives you the answer you need for the question at hand.  You first do the logistic regression itself so you can know what to put in the formula."


Anyone have any ideas?  Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
UrvishShah
Fluorite | Level 6

Hi Andy,

I never reported Cumulative Probability in my project of Logit Model but i can say that Cumulative Probability is like total probability and of course based on your professot comments, this value will be high...

If you devide 7100/609 (that is total probability devided by sample events) then it will give us the Average Probability...

What you can do is like just split the data after building Logit Model based on the Success and Failure and then count Cumulative Probability by using Proc Means for these two and then compare with Average Probability...

Or simply you can say Cumulative is approx. 600 times higher then average probability or you can also visualize the PHAT as a function of Variables you have choosen in your model (means graph PHAT on Y-Axis against number of variable you opted on X-Axis)...

There is no SAS Proc to transform LOG ODDS into Probability...You need to write some code for that to transform it...

Hope it makes you clear and help you in submitting your project...

-Urvish

View solution in original post

5 REPLIES 5
UrvishShah
Fluorite | Level 6

Hi Andy,

Try with the following code...It will calculate the predicated probability, cumulative probability and average probability...

Based on your professor comment i have prepared the SAS Codes as follow...

*==================================================

ANSWER - 1:

Build the Logit Model which holds

predicated probability for each individuals

===================================================;

proc logistic data = epiiii.evans descending noprint outest = betas;

    model CHD = CAT AGE CHL ECG SMK HPT CH CC;

     output out = pred p = phat xbeta = logit;

run;

quit;

proc datasets lib = work nolist;

    modify pred;

     format phat percent8. logit percent8.;

quit;

*==============================================

ANSWER - 2:

Calculation of Cumulative and Mean Probability

Compare the figures of this output

===============================================;

proc means data = pred noprint;

   var phat;

   output out = mean_VS_cum_prob sum = mean = / autoname;

run;

Here, you can also submit some few more findings...

------->   By default, LOGIT estimators gives the change in log odds with unit change in response 

           variable...But what if someone wants to interpreat in terms of probability insted of log odds...

           You simply need to follow the following formula

           BP(1-P)

           where B = respective parameters of the model...See the dataset BETAS

                 P = Assumed Probability at specified level (Normally overall proportion)

           Do the above thing for each of the predictors of your model and you will get to know the

           expecated change in probability due to one unit change in predictor variable...

           This kind of interpreatation is similar to what we do in Simple Linear Regression Model...

-Urvish

afarrey11
Calcite | Level 5

Hi Urvish,

First of all, thank you for your help.  I greatly appreciate it.  The output from the Proc Means statement and the created variable Mean_vs_Cum_Mean is below.  How do I interpret the cumulative probability?  7100% as a number doesn't mean too much to me when I'm comparing it to 12%.  Do I need to do anything to those numbers to transform them or are they accurate as is?  Is there a SAS procedure to transform the log odds reported to probabilities or do I do that by hand?  

The MEANS Procedure

                                                                           N    Mean         Std Dev    Minimum        Maximum

6090.11658460.14914890.0009139760.9969705

_TYPE__FREQ_Estimated Probability Estimated Probability

0       609                 7100%           12%             

UrvishShah
Fluorite | Level 6

Hi Andy,

I never reported Cumulative Probability in my project of Logit Model but i can say that Cumulative Probability is like total probability and of course based on your professot comments, this value will be high...

If you devide 7100/609 (that is total probability devided by sample events) then it will give us the Average Probability...

What you can do is like just split the data after building Logit Model based on the Success and Failure and then count Cumulative Probability by using Proc Means for these two and then compare with Average Probability...

Or simply you can say Cumulative is approx. 600 times higher then average probability or you can also visualize the PHAT as a function of Variables you have choosen in your model (means graph PHAT on Y-Axis against number of variable you opted on X-Axis)...

There is no SAS Proc to transform LOG ODDS into Probability...You need to write some code for that to transform it...

Hope it makes you clear and help you in submitting your project...

-Urvish

afarrey11
Calcite | Level 5

That makes sense; thank you so much for your help. We talked about it in class and your analysis/code was perfect. I mentioned I had asked for some help and that's completely allowed in our class for how our class is set up.

UrvishShah
Fluorite | Level 6

Happy to discuss with you Andy...

-Urvish

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1313 views
  • 3 likes
  • 2 in conversation