BookmarkSubscribeRSS Feed
louisar
Calcite | Level 5

Hello, I have data over a 6 year time period and am looking at changes in gender. my variable has 3 categories - female, male, unknown. Question: what can I say, if anything, about the growth in the % of male and % female over time given the presence of the unknown? Below I should some of the scenarios I am facing. Note: data is MADE UP for illustration.

 

situation 1: from 2004-2010, the % male increased from 7% to 11%, % of females increased from 74% to 83%, and unknown decreased from 19% to 6%. conclusion=> can't make any conclusions.

 

situation 2: % females decreased from 59% to 65%, % of males increased from 18% to 25%, and unknown decreased from 23% to 10%. conclusion: growth in male category.

 

situation 3: % of females decreased from 73% to 65%, % of males increased from 8% to 14%, and unknown increased from 19% to 21%. conclusion=> ? possibilities?

 

situation 4: % of females stayed the same at 6%, males increased from 70% to 80%, and unknown decreased from 24% to 14%.  conclusion: ?

 

Thank you for any assistance.

2 REPLIES 2
ballardw
Super User

with summaries like that I would not make any conclusion other than what you state.

For one thing what if data from 2004 had 15 records in the sample and the 2010 had 25,000?

 

Pick any start/end pair of points. I can create data that matches the end points that would throw questions on any statement of conclusions .

Lets look at your first one for example. 2004 you have 7% of responses male, 2010 you have 11% male. What conclusion would you draw if the data for 2005 through 2009 were 75% or greater males each year? That would make me think something funny went on in 2004 and 2010.

 

Would there be any natural reason because of your sample methodology that would bias the male/female ratio? If you pick on a particular industry/occupation/hobby activity or similar approach there might be a known bias. The question would be does your data match that bias.

 

What might be appropriate might be either weighting to a known population ratio or imputing gender for the "unknown".

 

You want to carefully read some of your own statements (or provide actual data sets). 2004 to 2010 is 7 years. Just like 2004 to 2005 is 2, the end point matters. Minor point.

 

"situation 2: % females decreased from 59% to 65%, % of males increased from 18% to 25%,"

Looks like an increase.

 

And a real underlying question. Why are they "Unknown"? No answer recorded? Some other answer like refused to answer?

 

louisar
Calcite | Level 5

thank you for the excellent feedback. that's a good point that I have to look at the years in between as well -  typically there is somewhat of an upward or downward trend.

The "unknown" category is individuals who are explicitly SELECTING an option that they do not wish to provide that information. does that make a difference in your interpretation?

It is difficult to know of any direction of bias, as it could go either way.  Imputing is not an option in our case.  One possibility is removing the "unknown" and then calculating the % male and % female on those who are known, but I think that has problems too.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 916 views
  • 1 like
  • 2 in conversation