Hi, my name is Andy. I'm a Master's of Public Health student and I'm in an Advanced Epi Methods class with math that I've never been taught, including the SAS programming. My assignment is as follows:
p
"The probability of heart disease.
Use the Evans county data. Feel free to use the material provided in K and K that uses this data set (you will find it in the index).
For each person in the data set, determine the probability of developing coronary heart disease (CHD) as a function of the variables you choose to include in the analysis. Determine the mean probability of CHD as a function of the variables you have chosen. Compare the mean probability to the cumulative probability.
The available variables are:
AGE age
CAT catecholamine status
CHD coronary heart disease (the outcome)
CHL cholesterol
DBP diastolic blood pressure
SBP systolic blood pressure
ECG electrocardiogram
HPT hypertension
SMK smoking
ID is the identifier for the individual
CC is the interaction of catecholamine and cholesterol
CH is the interaction of catecholamine and hypertension
(note: these interactions are on the data set but you are not limited to them)"
I've looked at the code from Kleinbaum and Klein, and it is as follows:
Proc logistic data=epiiii.evans descending;
Model CHD = CAT AGE CHL ECG SMK HPT CH CC / COVB;
run;
PROC GENMOD DATA = epiiii.evans DESCENDING;
MODEL CHD = CAT AGE CHL ECG SMK HPT CH CC/LINK = LOGIT DIST = BINOMIAL;
ESTIMATE "OR (CHL = 220, HPT = 1)" CAT 1 CC 220 CH 1/EXP;
ESTIMATE "OR (CHL = 220, HPT = 0)" CAT 1 CC 220 CH 0/EXP;
CONTRAST "LRT for interaction terms" CH 1, CC 1;
RUN;
I haven't a clue how to, "Compare the mean probability to the cumulative probability." I emailed my professor and he said:
"If you create a variable to hold the probability of the outcome, you can use the logistic model that
you built to calculate the probability of the outcome for each individual. If you add these, you get the cumulative probability. If you take the mean, you get the average probability. You can then simply divide the number of outcomes by the number of subjects to get the cumulative probability as well (the equivalent of the intercept with no variables). Compare these three numbers and say something about what they mean. I think this is a whole lot easier than you think it is."
and
"You create a variable, call it Prob or something. Make it equal to the formula for calculating the probability of the outcome for an individual: 1/(1- exp-(a +bx)). Calculate it for each individual. Then do a proc means on the variable Prob and ask for the sum and the mean. That gives you the answer you need for the question at hand. You first do the logistic regression itself so you can know what to put in the formula."
Anyone have any ideas? Thank you.
Hi Andy,
I never reported Cumulative Probability in my project of Logit Model but i can say that Cumulative Probability is like total probability and of course based on your professot comments, this value will be high...
If you devide 7100/609 (that is total probability devided by sample events) then it will give us the Average Probability...
What you can do is like just split the data after building Logit Model based on the Success and Failure and then count Cumulative Probability by using Proc Means for these two and then compare with Average Probability...
Or simply you can say Cumulative is approx. 600 times higher then average probability or you can also visualize the PHAT as a function of Variables you have choosen in your model (means graph PHAT on Y-Axis against number of variable you opted on X-Axis)...
There is no SAS Proc to transform LOG ODDS into Probability...You need to write some code for that to transform it...
Hope it makes you clear and help you in submitting your project...
-Urvish
Hi Andy,
Try with the following code...It will calculate the predicated probability, cumulative probability and average probability...
Based on your professor comment i have prepared the SAS Codes as follow...
*==================================================
ANSWER - 1:
Build the Logit Model which holds
predicated probability for each individuals
===================================================;
proc logistic data = epiiii.evans descending noprint outest = betas;
model CHD = CAT AGE CHL ECG SMK HPT CH CC;
output out = pred p = phat xbeta = logit;
run;
quit;
proc datasets lib = work nolist;
modify pred;
format phat percent8. logit percent8.;
quit;
*==============================================
ANSWER - 2:
Calculation of Cumulative and Mean Probability
Compare the figures of this output
===============================================;
proc means data = pred noprint;
var phat;
output out = mean_VS_cum_prob sum = mean = / autoname;
run;
Here, you can also submit some few more findings...
-------> By default, LOGIT estimators gives the change in log odds with unit change in response
variable...But what if someone wants to interpreat in terms of probability insted of log odds...
You simply need to follow the following formula
BP(1-P)
where B = respective parameters of the model...See the dataset BETAS
P = Assumed Probability at specified level (Normally overall proportion)
Do the above thing for each of the predictors of your model and you will get to know the
expecated change in probability due to one unit change in predictor variable...
This kind of interpreatation is similar to what we do in Simple Linear Regression Model...
-Urvish
Hi Urvish,
First of all, thank you for your help. I greatly appreciate it. The output from the Proc Means statement and the created variable Mean_vs_Cum_Mean is below. How do I interpret the cumulative probability? 7100% as a number doesn't mean too much to me when I'm comparing it to 12%. Do I need to do anything to those numbers to transform them or are they accurate as is? Is there a SAS procedure to transform the log odds reported to probabilities or do I do that by hand?
The MEANS Procedure
N Mean Std Dev Minimum Maximum
609 | 0.1165846 | 0.1491489 | 0.000913976 | 0.9969705 |
_TYPE__FREQ_Estimated Probability Estimated Probability
0 609 7100% 12%
Hi Andy,
I never reported Cumulative Probability in my project of Logit Model but i can say that Cumulative Probability is like total probability and of course based on your professot comments, this value will be high...
If you devide 7100/609 (that is total probability devided by sample events) then it will give us the Average Probability...
What you can do is like just split the data after building Logit Model based on the Success and Failure and then count Cumulative Probability by using Proc Means for these two and then compare with Average Probability...
Or simply you can say Cumulative is approx. 600 times higher then average probability or you can also visualize the PHAT as a function of Variables you have choosen in your model (means graph PHAT on Y-Axis against number of variable you opted on X-Axis)...
There is no SAS Proc to transform LOG ODDS into Probability...You need to write some code for that to transform it...
Hope it makes you clear and help you in submitting your project...
-Urvish
That makes sense; thank you so much for your help. We talked about it in class and your analysis/code was perfect. I mentioned I had asked for some help and that's completely allowed in our class for how our class is set up.
Happy to discuss with you Andy...
-Urvish
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.