Solved: comparing % female to population % female and weighting

saoirse872 · Posted 03-14-2022 07:23 PM

**Note – data is hypothetical****

I would like to compare the percentage of women in my data to the percentage of women in a population dataset, to see if the % women in my data is high or low. So for example,

	my data	population
Female	30%	25%
Male	70%	75%

However, the two datasets overall have different mixes of occupations and age groups, so we don’t expect the percentage of women to be the same. Therefore, we have to adjust for these differences. The table shows, for the population, the percentage of women in each combination of occupation and age groups, as well as the CIs around the percentage.

POPULATION DATA:
Professional	% of women	CIs for proportion
Younger	50%	(45%, 55%)
Older	70%	(64%, 76%)
Not professional
Older	20%	(15%, 25%)
Younger	15%	(10%, 20%

In my dataset, I have 50% who are not professional and older, 20% who are not professional and younger, and 30% who are professional and younger.

therefore, to get the expected proportion of women, I calculate: .5*.2 + .2*.15 + .3*.5 = .28

However, the population numbers have CIs around the estimates. Should those be ignored or taken into account here?

Thanks in advance for any thoughts.

StatDave · Posted 03-15-2022 11:48 AM

That was my initial thought but rejected it for a reason that, on reconsideration, doesn't seem problematic. So yes, I believe that would work and would not require aggregating the data. In PROC LOGISTIC, just be sure to use the EVENT= option after the response variable in the MODEL statement to select the female level as the event level to model. You can then specify SOURCE in the LSMEANS statement, rather than the SLICE statement, with the ILINK option to see the estimated, adjusted probabilities for the two sources.

View solution in original post

StatDave · Posted 03-14-2022 11:42 PM

One approach is to obtain the aggregated, combined data and fit a log-linear model (a Poisson model on the aggregated counts) with gender and source (sample, population) and their interaction as the predictors of primary interest, and occupation and age (and anything else needed) as covariates. You can then compare the sources adjusted for the covariates. For example, assuming that all of the above variables are categorical, or can be, the SLICE statement will provide the means in each gender-source combination and test for differences between the sources for each gender.

proc freq; 
table source*gender*occupation*age_group / noprint out=aggdata;
run;
proc genmod;
class gender source occupation age_group;
model count = gender|source occupation age_group / dist=poisson;
slice gender*source / sliceby=gender ilink means diff cl;
run;

saoirse872 · Posted 03-15-2022 11:07 AM

Thank you!! This is a fantastic suggestion. A follow up question - I'm curious as to rationale for a log linear model here? As opposed to say a logistic regression with gender as the dependent variable, and independent variables as age, source, and occupation?

StatDave · Posted 03-15-2022 11:48 AM

That was my initial thought but rejected it for a reason that, on reconsideration, doesn't seem problematic. So yes, I believe that would work and would not require aggregating the data. In PROC LOGISTIC, just be sure to use the EVENT= option after the response variable in the MODEL statement to select the female level as the event level to model. You can then specify SOURCE in the LSMEANS statement, rather than the SLICE statement, with the ILINK option to see the estimated, adjusted probabilities for the two sources.

saoirse872 · Posted 03-15-2022 12:30 PM

Wonderful. thank you for the excellent advice!!!

comparing % female to population % female and weighting

Re: comparing % female to population % female and weighting

Re: comparing % female to population % female and weighting

Re: comparing % female to population % female and weighting

Re: comparing % female to population % female and weighting

Re: comparing % female to population % female and weighting

comparing % female to population % female and weighting

Re: comparing % female to population % female and weighting

Re: comparing % female to population % female and weighting

Re: comparing % female to population % female and weighting

Re: comparing % female to population % female and weighting

Re: comparing % female to population % female and weighting

SAS Innovate 2025: Call for Content