BookmarkSubscribeRSS Feed
KarinGun
Calcite | Level 5

Dear all, 

 

I am running a Poisson regression by using proc genmod. The outcome is number of cancers and the predictors are Ssc (a rheumatic disease), age and sex. The aim is to compare different incidence rates of cancer. Logpy = log personyears. Outc = cancer. 

 

My main model looks like this:

 

proc genmod data=incidence6 plots = all;

class sex(ref="2") age(ref= "18") ssc(ref = "0");

model outc = ssc age sex/ dist=poisson link=log offset=logpy;

ods output parameterestimates=results; 

         run;

 

Now I would like to try to run a stratified model where I still adjust by age but I would like to make a stratification of different age cathegories (which I have already defined in a previous program). I would also like to make a stratified analysis by sex. 

 

I have tried to read and try some methods to do so but I haven't found any that works fine so far. I tried to use a by statement but it only returned one of my age cathegories. I read about the STRATA statement but then I would need to run an exact logistic regression and the offset variable can't be used as far as I understood and I suppose that would be a problem. 

 

Do you have any suggestions about coding/options to perform a stratified analysis in proc genmod poisson regression? 

 

Thank you on beforehand and best regards,

 

Karin Gunnarsson

6 REPLIES 6
ballardw
Super User

If your current age variable is numeric and basically values such as 18 through 99, or what ever values, then create a custom format to define the age groups and associated that format with AGE in the procedure.

You don't mention any specific groups and having them "in another data set" is likely not helpful.

Just for an example.

Proc format;
value myagegroups
18 - 25 ="18 to 25"
26 - 50 ="26 to 50"
51-high = "51+"
;
run

Then add: Format age myagegroups. ; to the Proc code.

You need to make the reference value in the Class Statement the Formatted value you want.

 

Groups created by formats are generally usable in analysis procedures.


@KarinGun wrote:

Dear all, 

 

I am running a Poisson regression by using proc genmod. The outcome is number of cancers and the predictors are Ssc (a rheumatic disease), age and sex. The aim is to compare different incidence rates of cancer. Logpy = log personyears. Outc = cancer. 

 

My main model looks like this:

 

proc genmod data=incidence6 plots = all;

class sex(ref="2") age(ref= "18") ssc(ref = "0");

model outc = ssc age sex/ dist=poisson link=log offset=logpy;

ods output parameterestimates=results; 

         run;

 

Now I would like to try to run a stratified model where I still adjust by age but I would like to make a stratification of different age cathegories (which I have already defined in a previous program). I would also like to make a stratified analysis by sex. 

 

I have tried to read and try some methods to do so but I haven't found any that works fine so far. I tried to use a by statement but it only returned one of my age cathegories. I read about the STRATA statement but then I would need to run an exact logistic regression and the offset variable can't be used as far as I understood and I suppose that would be a problem. 

 

Do you have any suggestions about coding/options to perform a stratified analysis in proc genmod poisson regression? 

 

Thank you on beforehand and best regards,

 

Karin Gunnarsson


 

KarinGun
Calcite | Level 5

Dear Ballardw,

Thank you for your answer and I am sorry I was not clear in my question, I am quite new to SAS.

I have now run the Proc format for myagegroups and it worked fine. 

However I wonder where you mean that I should add the Format age myagegroups. in the Proc code?

 

Thank you on beforehand.

Best,

 

Karin 

 

 

ballardw
Super User

Format is one of the use anywhere statements. So is not included in the documentation for each procedure.

So add the statement to use the format and the change to Class such as:

proc genmod data=incidence6 plots = all;
   class sex(ref="2") age(ref= "18 to 25") ssc(ref = "0");
   model outc = ssc age sex/ dist=poisson link=log offset=logpy;
   ods output parameterestimates=results; 
   format age myagegroup.;  
run;

Formats are a very powerful tool in SAS and are worth learning at least the basics. There is a lot of code floating around where people add variables to do analysis on slightly different groups of values. If the groups are based on a single variable then formats are often more flexible. When data sets get large as in millions of records adding a variable can take time and uses up more disk space. But a format is very quick to execute and adds little time to the analysis procedure execution time. Plus you can, with experience, create a number of formats to use as needed. You just need to make sure the format code is run in each session or point a system option FMTSEARCH to one or more permanent storage locations.

 

Two key pieces with formats: do not end the name in a number and character format names start with $. You can have character and numeric formats with the same name because of that $ difference.

Ranges of values,  the - between the ages in the example, do not work well with character values because of the rules used for determining equality yield odd results. So list character values explicitly.

KarinGun
Calcite | Level 5

Dear Ballardw, 

 

Thank you for your thorough reply! Formats sounds like a powerful tool and I will definitely try to learn more about them. 

 

I tried to incorporate the format of myagegroup in my model. Did you mean I should put the myagegroup in the class statement? If I do SAS interprets myagegroup as a variable and returns the error message that this variable is not found even though I incorporate the format in the program as you suggested and it has been run before the class statement. Is it another way for me to incorporate a format in the class statement?

 

If I run the code below no stratification on myagegroup is done as far as I can see in the output. 

proc genmod data=incidence6 plots = all;
   class sex(ref="2") age(ref= "18 to 25") ssc(ref = "0");
   model outc = ssc age sex/ dist=poisson link=log offset=logpy;
   ods output parameterestimates=results; 
   format age myagegroup.;  
run;

Sorry if I am slow in understanding your suggestions, this is the first model I run in SAS so I have little experience. 

 

Best regards, 

 

Karin

 

StatDave
SAS Super FREQ
Regarding stratification, it depends what you want. If you use a BY statement, the result would be completely separate models for each sex (if you specify BY SEX;). In that case, the two models would have different parameter estimates for the other predictors (age, ssc) which might be a problem depending on your goal. If instead you want a single model that estimates the parameters for age and ssc and controls for sex without estimating parameters for sex, then you could use a Generalized Estimating Equations (GEE) model with PROC GEE. The GEE model would give you a single model, controlling for sex, with a single set of parameter estimates involving age and ssc. If you use the STRATA statement in PROC GENMOD to fit a conditional Poisson model, you will also have to include the EXACT statement. The presence of an offset is not a problem - this is discussed in the Details section of the GENMOD documentation on exact estimation. But fitting the model with the exact method is generally not feasible unless the data set is quite small since that method is very computationally and memory intensive.
KarinGun
Calcite | Level 5

Thanks a lot for this informative reply. 

 

Best regards, 

 

Karin

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1404 views
  • 3 likes
  • 3 in conversation