BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
lone0708
Fluorite | Level 6

Hi all, 

i am using proc genmod in a trend analysis og affected persons by each calender year. 

My current output is giving me an estimate for every year, but i need an estimate of the trend for all the years together. What to add in my programming?

 

proc genmod data = have descending;
class id year (ref="2001");
model affected = year / dist = bin
                                      link = id
where year ge 2001;
repeated subject = affected; run;
1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

It's important to note that if you fit the model with Year as continuous and with LINK=ID, you are assuming that the probability of being affected increases linearly across the entire range of years in your data. It's possible that this is reasonable, but it definitely is not a valid assumption over a wide enough range of years because the probability cannot go above 1 or below 0. With the identity link, this model says that the affected probability will just keep going up (or down) without limit as Year increases (or decreases). In general, the shape of the curve relating the affected probability to Year is S shaped, asymptoting at zero and one. If it turns out that your range of years just covers the middle part of the true curve, then this model might be approximately good. But if the range covers the parts of the curve that bend toward zero or one, then this linear probability model will not be accurate. In that case, the change in the affected probability will not be constant, but will change at different years. So, you might want to plot the observed proportion affected at each year and see if the change is reasonably linear before you decide to adopt this model.

 

Ultimately, what you are asking for is the marginal effect of Year on the affected probability which is the instantaneous change in the probability at a given year. If that change is about constant, then the average marginal effect is what you want. This can be done with the Margins macro. That macro can both fit the model and provide an estimate of the average marginal effect (specify dist=binomial, link=logit, effect=year in the macro call). This note discusses and illustrates.

View solution in original post

6 REPLIES 6
Rick_SAS
SAS Super FREQ

You are getting an estimate for each year because you have included YEAR on the CLASS statement. If you want an overall trend, consider whether it would be appropriate to model YEAR as a continuous variable. If so, remove YEAR from the class statements. If you want to retain YEAR=2001 as the reference year, you can create a new variable

Years_Since_2001 = Year - 2001;

and use that variable in the analysis.

lone0708
Fluorite | Level 6

Hi Rick, 

Thanks for the answer, 

Year can definitely be continuous. 
Maybe i should specifiy a little clearer what i want to do. 
I want to find the percentage change in affected persons in the period from 2001 onwards, so i need one common estimate for the whole period = how much have the number of affected persons increased in the study period.


Is the following the correct way?

 

proc genmod data = have descending;
class id year_since_2001;
model affected = year / dist = bin
                                      link = id
where year ge 2001;
repeated subject = affected; run;

 

Rick_SAS
SAS Super FREQ

Please re-read my earlier response in which I said " If so, remove YEAR from the CLASS statement."

lone0708
Fluorite | Level 6

I am sorry, that i do not understand. I removed year from the class statement. 
is the following correct and gives me one estimate for the change during all of the study period since 2001?

 

proc genmod data = have descending;
class id;
model affected = year_since_2001 / dist = bin
                                      link = id
where year ge 2001;
repeated subject = affected; run;

 

Rick_SAS
SAS Super FREQ

You are the only one who can decide whether the code matches the model you want. 

I suspect you have several mistakes:

1. Need a semicolon at the end of the MODEL statement.

2. I don't know why you are using an identity link function. Typically, analysts use a logit link for binomial data.

3. The SUBJECT= variable should probably be ID.

4. This syntax assumes that the response variable is a binary response (for example, values 0 and 1).

 

The PROC GENMOD documentation has an example of binary data with a REPEATED statement. You might want to study that example to understand the purpose of the syntax in the procedure: SAS Help Center: GEE for Binary Data with Logit Link Function 

 

StatDave
SAS Super FREQ

It's important to note that if you fit the model with Year as continuous and with LINK=ID, you are assuming that the probability of being affected increases linearly across the entire range of years in your data. It's possible that this is reasonable, but it definitely is not a valid assumption over a wide enough range of years because the probability cannot go above 1 or below 0. With the identity link, this model says that the affected probability will just keep going up (or down) without limit as Year increases (or decreases). In general, the shape of the curve relating the affected probability to Year is S shaped, asymptoting at zero and one. If it turns out that your range of years just covers the middle part of the true curve, then this model might be approximately good. But if the range covers the parts of the curve that bend toward zero or one, then this linear probability model will not be accurate. In that case, the change in the affected probability will not be constant, but will change at different years. So, you might want to plot the observed proportion affected at each year and see if the change is reasonably linear before you decide to adopt this model.

 

Ultimately, what you are asking for is the marginal effect of Year on the affected probability which is the instantaneous change in the probability at a given year. If that change is about constant, then the average marginal effect is what you want. This can be done with the Margins macro. That macro can both fit the model and provide an estimate of the average marginal effect (specify dist=binomial, link=logit, effect=year in the macro call). This note discusses and illustrates.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1105 views
  • 5 likes
  • 3 in conversation