BookmarkSubscribeRSS Feed
Geoghegan
Obsidian | Level 7

I'm using SAS Studio and trying to compare the proportions of people in seven age groups within two populations (location variable is 1/0) and see if there is a significant different in distribution. Currently my code is:

proc freq data=demographics;
table agegroup*location/chisq nocum norow nopercent;
run;

 

Am I correct in thinking that this will show me if there is a statistically significant difference in the proportions in the age groups comparing the two location options (1 vs 0)?

 

Also, is there a reasonable way to compare each age group (for example, to see if there is a significant difference between the proportion in the first age group in the 1 location compared to the first age group in the 0 location, etc for each age group? 

 

Thank you!!

8 REPLIES 8
sbxkoenk
SAS Super FREQ


You might need to correct for multiple testing (inflation of the type I - error = false positive rate).

 

Koen

Geoghegan
Obsidian | Level 7

Thank you, I'll look through those!

ballardw
Super User

@Geoghegan wrote:

 

Am I correct in thinking that this will show me if there is a statistically significant difference in the proportions in the age groups comparing the two location options (1 vs 0)?

 


The chi square tests for differences of distribution, i.e. all age groups at once. 

 

Restrict the age groups to two of interest to test them. A WHERE is easy to add. Something like:

Where agegroup in (1, 3); assumes your agegroups have coding like that if only interested in a couple.

 

OR

proc logistic data=demographics;
class location/ param=ref;
model agegroup= location/ link=glogit;
run;

The tests of the Location parameter estimates are tests comparing the groups (locations) at each level of age group.

Geoghegan
Obsidian | Level 7

oh, thank you! I think the proc logistic may be very helpful!

StatDave
SAS Super FREQ

The chi-square test for your table tests whether the age groups all have the same proportion of location=1 (or of location=0). Equivalently, it tests if the distributions across the age groups are the same for the two locations. If what you then want is to do pairwise comparisons among the age groups on the proportion of location=1 (or 0), then it is easiest to use logistic regression as below. The LSMEANS differences table gives the tests comparing the proportion in each pair of age groups.

proc logistic data=demographics;
class agegroup/param=glm;
model location(event='1')=agegroup;
lsmeans agegroup / ilink cl diff;
run;

But your wording of the question is odd. It sounds like you want to test whether the proportions of the two locations are the same in a given age group. That is equivalent to testing whether the proportion of location=1 in a given age group equals 0.5 since the row proportions must add to 1. If you really want to do that for each age group, you would use a binomial test that the proportion of location=1 (or 0) equals 0.5 in each age group. You would do that like this:

proc sort data=demographics; 
by agegroup; 
run;
proc freq;
by agegroup;
table location/binomial;
run;
Geoghegan
Obsidian | Level 7

thank you for the explanation, I think I may be thinking about this wrong. I wanted to compare the distribution of ages between one location and the other, not between age groups within one location. So it seems like I already have that (just comparing all of the age group distribution in one to all of the age group distribution in the other). 

 

One last question - does the chi-squared test comparing these just compare the percentages in each age group between the two locations or does it take into account whether or not the sample size in each age group is enough to determine if the age distribution really was different? For example, if one location only had 2 people in one age group, that could make it difficult to be certain if the age distribution really was different. I'm sorry if that doesn't make sense!

ballardw
Super User

The procedure will report warnings for small cell counts at the interpretation needs to be carefully considered if the count is less than 5.

Whether to consider that a wrong conclusion is part of the art in analysis.

 

 

StatDave
SAS Super FREQ

>you say: "I wanted to compare the distribution of ages between one location and the other, not between age groups within one location."

That sounds to me like it is consistent with my previous statement of one of the ways to interpret the chi-square from PROC FREQ:  "it tests if the distributions across the age groups are the same for the two locations."  Sample sizes are involved in the test, so if you had the same proportions but very different sample size, the results would differ. 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 1043 views
  • 2 likes
  • 4 in conversation