Hello!
I need to conduct a test for trend on percentages to get a p-value. I found the following code in a SAS community website which is similar to the data I have:
data test(keep=year case log_n);
infile datalines;
input year case n;
log_n=log(n);
output test;
datalines;
2000 100 50000
2005 75 60000
2010 50 75000;
run;
proc genmod data=test;
model case=year / dist=poisson link=log offset=log_n;
run;
A screenshot of the output is attached.
Is it correct to say that there is a significant difference between the proportions (cases/log_n) in the 3 years based on the Pr>ChisSq <0.001? I'm not convinced about this output because the DF for year is 1. Also, I don't understand why one would want to get the log of the denominator but not of the cases.
Many thanks!
Janet
See this note about testing trend in proportions. As shown there, you don't need to fit a model to get a test - nonmodeling approaches via PROC FREQ and PROC MULTTEST are available when there are no covariates. You can also use a modeling approach with a logistic model in PROC LOGISTIC as shown or PROC GENMOD, though your Poisson model with offset in GENMOD is also reasonable. To fit a logistic model to your summarized data, specify MODEL CASE/N=YEAR; in PROC LOGISTIC. The 1 df result from your GENMOD model is a test of linear trend. If you want to test for differences among the years, rather than trend, then add a CLASS statement with YEAR specified in it which will result in a 2 df test in your case. See this note about testing for differences (not trend) among proportions.
Hi,
Thanks for your feedback. I've tried using:
Code
data test;
infile datalines;
input year case n;
output test;
datalines;
2000 100 50000
2005 75 60000
2010 50 75000;
run;
data test;
set test;
perc=round(((case/n)*100),0.01);
run;
proc freq data = test;
table perc*year / exact trend;
run;
The p-value that I'm looking for seems to be the Mantel-Haenszel Chi-Square in the output (attached).
I'm curious as to why the p-value using the proc genmod is significant while the p-value from the proc freq is not:
Code:
data test(keep=year case log_n);
infile datalines;
input year case n;
log_n=log(n);
output test;
datalines;
2000 100 50000
2005 75 60000
2010 50 75000
;
run;
proc genmod data=test;
model case=year / dist=poisson link=log offset=log_n;
run;
SAS Output for proc genmod attached.
Do you have any idea why the p-values are different? I would like to be sure which one is correct to use / what the correct interpretation is.
Thanks,
Jane
Your trend analysis in PROC FREQ is not correct since it doesn't use the actual sample sizes. The following code uses the observed sample sizes. Because your sample sizes are so large, there is no need for an exact test.
data test;
input year case n;
w=case; y=1; output;
w=n-case; y=0; output;
datalines;
2000 100 50000
2005 75 60000
2010 50 75000
;
proc freq;
weight w;
table year*y/trend;
run;
A similar analysis can be done in PROC GENMOD or PROC LOGISTIC as I mentioned. Note that the square root of the Score chi-square is the same as the Cochran-Armitage statistic in PROC FREQ.
proc logistic;
freq w;
model y(event="1")=year;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.