turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- Need some syntax help please...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-21-2013 11:49 PM

Hi, my name is Andy. I'm a Master's of Public Health student and I'm in an Advanced Epi Methods class with math that I've never been taught, including the SAS programming. My assignment is as follows:

p

"The probability of heart disease.

Use the Evans county data. Feel free to use the material provided in K and K that uses this data set (you will find it in the index).

For each person in the data set, determine the probability of developing coronary heart disease (CHD) as a function of the variables you choose to include in the analysis. Determine the mean probability of CHD as a function of the variables you have chosen. Compare the mean probability to the cumulative probability.

The available variables are:

AGE age

CAT catecholamine status

CHD coronary heart disease (the outcome)

CHL cholesterol

DBP diastolic blood pressure

SBP systolic blood pressure

ECG electrocardiogram

HPT hypertension

SMK smoking

ID is the identifier for the individual

CC is the interaction of catecholamine and cholesterol

CH is the interaction of catecholamine and hypertension

(note: these interactions are on the data set but you are not limited to them)"

I've looked at the code from Kleinbaum and Klein, and it is as follows:

**Proc** **logistic** data=epiiii.evans descending;

Model CHD = CAT AGE CHL ECG SMK HPT CH CC / COVB;

**run**;

**PROC** **GENMOD** DATA = epiiii.evans DESCENDING;

MODEL CHD = CAT AGE CHL ECG SMK HPT CH CC/LINK = LOGIT DIST = BINOMIAL;

ESTIMATE "OR (CHL = 220, HPT = 1)" CAT **1** CC **220** CH **1**/EXP;

ESTIMATE "OR (CHL = 220, HPT = 0)" CAT **1** CC **220** CH **0**/EXP;

CONTRAST "LRT for interaction terms" CH **1**, CC **1**;

**RUN**;

I haven't a clue how to, "Compare the mean probability to the cumulative probability." I emailed my professor and he said:

"If you create a variable to hold the probability of the outcome, you can use the logistic model that

you built to calculate the probability of the outcome for each individual. If you add these, you get the cumulative probability. If you take the mean, you get the average probability. You can then simply divide the number of outcomes by the number of subjects to get the cumulative probability as well (the equivalent of the intercept with no variables). Compare these three numbers and say something about what they mean. I think this is a whole lot easier than you think it is."

and

"You create a variable, call it Prob or something. Make it equal to the formula for calculating the probability of the outcome for an individual: 1/(1- exp-(a +bx)). Calculate it for each individual. Then do a proc means on the variable Prob and ask for the sum and the mean. That gives you the answer you need for the question at hand. You first do the logistic regression itself so you can know what to put in the formula."

Anyone have any ideas? Thank you.

Accepted Solutions

Solution

10-23-2013
01:45 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to afarrey11

10-23-2013 01:45 AM

Hi Andy,

I never reported Cumulative Probability in my project of Logit Model but i can say that Cumulative Probability is like total probability and of course based on your professot comments, this value will be high...

If you devide 7100/609 (that is total probability devided by sample events) then it will give us the Average Probability...

What you can do is like just split the data after building Logit Model based on the Success and Failure and then count Cumulative Probability by using Proc Means for these two and then compare with Average Probability...

Or simply you can say Cumulative is approx. 600 times higher then average probability or you can also visualize the PHAT as a function of Variables you have choosen in your model (means graph PHAT on Y-Axis against number of variable you opted on X-Axis)...

There is no SAS Proc to transform LOG ODDS into Probability...You need to write some code for that to transform it...

Hope it makes you clear and help you in submitting your project...

-Urvish

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to afarrey11

10-22-2013 02:37 AM

Hi Andy,

Try with the following code...It will calculate the predicated probability, cumulative probability and average probability...

Based on your professor comment i have prepared the SAS Codes as follow...

*==================================================

ANSWER - 1:

Build the Logit Model which holds

predicated probability for each individuals

===================================================;

**proc** **logistic** data = epiiii.evans descending noprint outest = betas;

model CHD = CAT AGE CHL ECG SMK HPT CH CC;

output out = pred p = phat xbeta = logit;

**run**;

**quit**;

**proc** **datasets** lib = work nolist;

modify pred;

format phat percent8. logit percent8.;

**quit**;

*==============================================

ANSWER - 2:

Calculation of Cumulative and Mean Probability

Compare the figures of this output

===============================================;

**proc** **means** data = pred noprint;

var phat;

output out = mean_VS_cum_prob sum = mean = / autoname;

**run**;

Here, you can also submit some few more findings...

-------> By default, LOGIT estimators gives the change in log odds with unit change in response

variable...But what if someone wants to interpreat in terms of probability insted of log odds...

You simply need to follow the following formula

BP(1-P)

where B = respective parameters of the model...See the dataset BETAS

P = Assumed Probability at specified level (Normally overall proportion)

Do the above thing for each of the predictors of your model and you will get to know the

expecated change in probability due to one unit change in predictor variable...

This kind of interpreatation is similar to what we do in Simple Linear Regression Model...

-Urvish

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to UrvishShah

10-22-2013 10:36 AM

Hi Urvish,

First of all, thank you for your help. I greatly appreciate it. The output from the Proc Means statement and the created variable Mean_vs_Cum_Mean is below. How do I interpret the cumulative probability? 7100% as a number doesn't mean too much to me when I'm comparing it to 12%. Do I need to do anything to those numbers to transform them or are they accurate as is? Is there a SAS procedure to transform the log odds reported to probabilities or do I do that by hand?

The MEANS Procedure

N Mean Std Dev Minimum Maximum

609 | 0.1165846 | 0.1491489 | 0.000913976 | 0.9969705 |

_TYPE__FREQ_Estimated Probability Estimated Probability

0 609 7100% 12%

Solution

10-23-2013
01:45 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to afarrey11

10-23-2013 01:45 AM

Hi Andy,

I never reported Cumulative Probability in my project of Logit Model but i can say that Cumulative Probability is like total probability and of course based on your professot comments, this value will be high...

If you devide 7100/609 (that is total probability devided by sample events) then it will give us the Average Probability...

What you can do is like just split the data after building Logit Model based on the Success and Failure and then count Cumulative Probability by using Proc Means for these two and then compare with Average Probability...

Or simply you can say Cumulative is approx. 600 times higher then average probability or you can also visualize the PHAT as a function of Variables you have choosen in your model (means graph PHAT on Y-Axis against number of variable you opted on X-Axis)...

There is no SAS Proc to transform LOG ODDS into Probability...You need to write some code for that to transform it...

Hope it makes you clear and help you in submitting your project...

-Urvish

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to UrvishShah

10-23-2013 02:15 PM

That makes sense; thank you so much for your help. We talked about it in class and your analysis/code was perfect. I mentioned I had asked for some help and that's completely allowed in our class for how our class is set up.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to afarrey11

10-24-2013 09:33 AM

Happy to discuss with you Andy...

-Urvish