Dear all,
I am running a Poisson regression by using proc genmod. The outcome is number of cancers and the predictors are Ssc (a rheumatic disease), age and sex. The aim is to compare different incidence rates of cancer. Logpy = log personyears. Outc = cancer.
My main model looks like this:
proc genmod data=incidence6 plots = all;
class sex(ref="2") age(ref= "18") ssc(ref = "0");
model outc = ssc age sex/ dist=poisson link=log offset=logpy;
ods output parameterestimates=results;
run;
Now I would like to try to run a stratified model where I still adjust by age but I would like to make a stratification of different age cathegories (which I have already defined in a previous program). I would also like to make a stratified analysis by sex.
I have tried to read and try some methods to do so but I haven't found any that works fine so far. I tried to use a by statement but it only returned one of my age cathegories. I read about the STRATA statement but then I would need to run an exact logistic regression and the offset variable can't be used as far as I understood and I suppose that would be a problem.
Do you have any suggestions about coding/options to perform a stratified analysis in proc genmod poisson regression?
Thank you on beforehand and best regards,
Karin Gunnarsson
If your current age variable is numeric and basically values such as 18 through 99, or what ever values, then create a custom format to define the age groups and associated that format with AGE in the procedure.
You don't mention any specific groups and having them "in another data set" is likely not helpful.
Just for an example.
Proc format; value myagegroups 18 - 25 ="18 to 25" 26 - 50 ="26 to 50" 51-high = "51+" ;
run
Then add: Format age myagegroups. ; to the Proc code.
You need to make the reference value in the Class Statement the Formatted value you want.
Groups created by formats are generally usable in analysis procedures.
@KarinGun wrote:
Dear all,
I am running a Poisson regression by using proc genmod. The outcome is number of cancers and the predictors are Ssc (a rheumatic disease), age and sex. The aim is to compare different incidence rates of cancer. Logpy = log personyears. Outc = cancer.
My main model looks like this:
proc genmod data=incidence6 plots = all;
class sex(ref="2") age(ref= "18") ssc(ref = "0");
model outc = ssc age sex/ dist=poisson link=log offset=logpy;
ods output parameterestimates=results;
run;
Now I would like to try to run a stratified model where I still adjust by age but I would like to make a stratification of different age cathegories (which I have already defined in a previous program). I would also like to make a stratified analysis by sex.
I have tried to read and try some methods to do so but I haven't found any that works fine so far. I tried to use a by statement but it only returned one of my age cathegories. I read about the STRATA statement but then I would need to run an exact logistic regression and the offset variable can't be used as far as I understood and I suppose that would be a problem.
Do you have any suggestions about coding/options to perform a stratified analysis in proc genmod poisson regression?
Thank you on beforehand and best regards,
Karin Gunnarsson
Dear Ballardw,
Thank you for your answer and I am sorry I was not clear in my question, I am quite new to SAS.
I have now run the Proc format for myagegroups and it worked fine.
However I wonder where you mean that I should add the Format age myagegroups. in the Proc code?
Thank you on beforehand.
Best,
Karin
Format is one of the use anywhere statements. So is not included in the documentation for each procedure.
So add the statement to use the format and the change to Class such as:
proc genmod data=incidence6 plots = all; class sex(ref="2") age(ref= "18 to 25") ssc(ref = "0"); model outc = ssc age sex/ dist=poisson link=log offset=logpy; ods output parameterestimates=results; format age myagegroup.; run;
Formats are a very powerful tool in SAS and are worth learning at least the basics. There is a lot of code floating around where people add variables to do analysis on slightly different groups of values. If the groups are based on a single variable then formats are often more flexible. When data sets get large as in millions of records adding a variable can take time and uses up more disk space. But a format is very quick to execute and adds little time to the analysis procedure execution time. Plus you can, with experience, create a number of formats to use as needed. You just need to make sure the format code is run in each session or point a system option FMTSEARCH to one or more permanent storage locations.
Two key pieces with formats: do not end the name in a number and character format names start with $. You can have character and numeric formats with the same name because of that $ difference.
Ranges of values, the - between the ages in the example, do not work well with character values because of the rules used for determining equality yield odd results. So list character values explicitly.
Dear Ballardw,
Thank you for your thorough reply! Formats sounds like a powerful tool and I will definitely try to learn more about them.
I tried to incorporate the format of myagegroup in my model. Did you mean I should put the myagegroup in the class statement? If I do SAS interprets myagegroup as a variable and returns the error message that this variable is not found even though I incorporate the format in the program as you suggested and it has been run before the class statement. Is it another way for me to incorporate a format in the class statement?
If I run the code below no stratification on myagegroup is done as far as I can see in the output.
proc genmod data=incidence6 plots = all; class sex(ref="2") age(ref= "18 to 25") ssc(ref = "0"); model outc = ssc age sex/ dist=poisson link=log offset=logpy; ods output parameterestimates=results; format age myagegroup.; run;
Sorry if I am slow in understanding your suggestions, this is the first model I run in SAS so I have little experience.
Best regards,
Karin
Thanks a lot for this informative reply.
Best regards,
Karin
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.