About DD6410

sld · ‎12-14-2016

I'll put forth this code suggestion. I'm not 100% confident (Edit, OK, 95% confident) in its validity, but others are welcome to critique. (Edit: I just noticed that you provided the actual sample sizes. You can swap them in the code below.) /* Make up some data */ data have; /* Total sample size for center=0 is 200 */ mrace="H"; center=0; count=round(0.500*200, 1); output; mrace="B"; center=0; count=round(0.263*200, 1); output; mrace="W"; center=0; count=round(0.186*200, 1); output; mrace="O"; center=0; count=round(0.027*200, 1); output; mrace="N"; center=0; count=round(0.024*200, 1); output; /* Total sample size for center=1 is 240 */ mrace="H"; center=1; count=round(0.469*240, 1); output; mrace="B"; center=1; count=round(0.398*240, 1); output; mrace="W"; center=1; count=round(0.083*240, 1); output; mrace="O"; center=1; count=round(0.024*240, 1); output; mrace="N"; center=1; count=round(0.026*240, 1); output; run; /* Check column totals */ proc means data=have sum; var count; by center; run; /* Create offset variable */ data have; set have; if center=0 then total=200; else if center=1 then total=241; /* effect of rounding to integers */ total_log = log(total); grand_total_log = log(200+241); run; /* Chi-square test of homogeneity of proportions */ proc freq data=have; table mrace*center / chisq; weight count; run; /* Log-linear model approach */ /* Define proportions on column totals using offset */ proc genmod data=have; class center mrace; model count = center mrace center*mrace / dist=poisson type3 offset=total_log ; lsmeans center*mrace / ilink; lsmestimate center*mrace "B diff" 1 0 0 0 0 -1 0 0 0 0, "H diff" 0 1 0 0 0 0 -1 0 0 0, "N diff" 0 0 1 0 0 0 0 -1 0 0, "O diff" 0 0 0 1 0 0 0 0 -1 0, "W diff" 0 0 0 0 1 0 0 0 0 -1 / adjust=simulate(seed=12345); run; An alternative approach, that I think is what @PGStats had in mind, is to create a subset of the data with one level of mother_race versus all others combined. You'd then need to consider some form of Type I error control for the family of 5 tests, which you could do with the MULTTEST procedure. For example, for mrace=B /* Create a variable with levels (B=1, notB=0) */ data dsB; set have; B = (mrace="B"); run; proc sort data=dsB; by center B; proc means data=dsB noprint; by center B; var count; output out=testB sum=count; run; data testB; set testB; if center=0 then total=200; else if center=1 then total=241; total_log = log(total); grand_total_log = log(200+241); run; /* Chi-square test of homogeneity of proportions */ proc freq data=testB; table B*center / chisq; weight count; run; /* Log-linear model approach */ /* Define proportions on column totals */ proc genmod data=testB; class center B; model count = center B center*B / dist=poisson type3 offset=total_log ; lsmeans center*B / ilink; lsmestimate center*B "B diff" 0 1 0 -1 ; /* p value here is based on z-test, not chi-sq */ run; Ideally in this case, the test for mrace*center (generated by the MODEL statement) would match the test of B proportions equal between center levels (generated by the LSMESTIMATE statement); the two tests deliver the same story (yay!), but the test statistics are not the same, hence the p-values are not the same. I don't know if there's a way to get LSMESTIMATE to produce a chi-square test; I was not successful with what I tried. But the z-test might be good enough, especially if sample sizes are big enough.

Online Status	Offline
Date Last Visited	‎12-19-2016 10:17 AM

Re: Test difference in proportions for each level of categorical varia...

Test difference in proportions for each level of categorical variable ...

Re: Test difference in proportions for each level of categorical varia...

Re: Test difference in proportions for each level of categorical varia...