Solved: is appropriate to use error bars to account for missing data when you ...

🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

Obsidian | Level 7

is appropriate to use error bars to account for missing data when you have the full population

Posted 01-24-2022 05:07 PM (1073 views)

Hi all, have data on the large number of people who were applying to receive an award. Everyone who applied must fill out a survey and specify their gender (although they can choose "prefer not to say"). My understanding it is NOT appropriate to put "error bars" on this data given that the survey was not a sample of the population, but rather the whole population.

However, say there is missing data (some people did not report their gender) and one argues that error bars are needed to account for the missing data. Hypothetical example: Submissions are 40% female, 50% male, and 10% did not say. Or you could say that success rates for females are 25%, males 26%, unknown is 27% (so conclusion depends on who is in unknown); and one could conceivably create error bars to show what the result would be if ALL unknowns were all women v. ALL the unknowns are men.

My understanding is that the error bars are used to reflect sampling error and NOT missing data. With missing data one would typically impute. Is this correct? Is there any value in making up error bars for missing data?

(Note: cross posted to stack exchange but no responses)

1 ACCEPTED SOLUTION

Accepted Solutions

Diamond | Level 26

Re: is appropriate to use error bars to account for missing data when you have the full population

Posted 01-25-2022 06:01 AM (996 views) | In reply to mjkop56

@mjkop56 wrote:

Thank you! Hypothetical example. Here is the original data:

Male Female PNA

# Applied 900 1800 300

# Selected 330 570 100

Success Rate 37% 32%
33%

2 extremes:

1) all the PNA are male - success rate for male= 36%, female = 32% (note, F does not change)

2) all the PNA are female - success rates for male = 37% (M does not change), female = 32%

I guess it's more that the data would be presented as a range: Male success rate ranges from 36% - 37%

and Female success rate ranges from 32%-32%

(Note this example is made up, real data would show more of a difference)

Okay, you can certainly do this, these are not error bars in any standard sense, and you would be advised to make that very very very very clear. As pointed out by @Reeza , also be prepared to have your assumptions challenged.

--
Paige Miller

View solution in original post

16 REPLIES 16

Diamond | Level 26

Re: is appropriate to use error bars to account for missing data when you have the full population

Posted 01-24-2022 05:14 PM (1065 views) | In reply to mjkop56

However, say there is missing data (some people did not report their gender) and one argues that error bars are needed to account for the missing data.

How would you compute these error bars in this case? These would not be based upon some standard error of the estimate that is usually done with sampled date. They would be some other type of error bars ... and that's fine with me as long as you explain that these would not be based on sampling variability. But really, how do you compute the error bars in this case?

Imputing is another way to handle this.

--
Paige Miller

Obsidian | Level 7

Re: is appropriate to use error bars to account for missing data when you have the full population

Posted 01-24-2022 07:27 PM (1029 views) | In reply to PaigeMiller

Thank you! Hypothetical example. Here is the original data:

	Male	Female	PNA
# Applied	900	1800	300
# Selected	330	570	100
Success Rate	37%	32%	33%

2 extremes:

1) all the PNA are male - success rate for male= 36%, female = 32% (note, F does not change)

2) all the PNA are female - success rates for male = 37% (M does not change), female = 32%

I guess it's more that the data would be presented as a range: Male success rate ranges from 36% - 37%

and Female success rate ranges from 32%-32%

(Note this example is made up, real data would show more of a difference)

Diamond | Level 26

Re: is appropriate to use error bars to account for missing data when you have the full population

Posted 01-25-2022 06:01 AM (997 views) | In reply to mjkop56

@mjkop56 wrote:

Thank you! Hypothetical example. Here is the original data:

Male Female PNA

# Applied 900 1800 300

# Selected 330 570 100

Success Rate 37% 32%
33%

2 extremes:

1) all the PNA are male - success rate for male= 36%, female = 32% (note, F does not change)

2) all the PNA are female - success rates for male = 37% (M does not change), female = 32%

I guess it's more that the data would be presented as a range: Male success rate ranges from 36% - 37%

and Female success rate ranges from 32%-32%

(Note this example is made up, real data would show more of a difference)

--
Paige Miller

Obsidian | Level 7

Re: is appropriate to use error bars to account for missing data when you have the full population

Posted 01-29-2022 03:19 PM (890 views) | In reply to PaigeMiller

thank you - your responses have been very helpful.

Super User

Re: is appropriate to use error bars to account for missing data when you have the full population

Posted 01-25-2022 10:25 AM (970 views) | In reply to mjkop56

Missing data are hard. There's a whole literature about handling missing data. I think what you're describing is some sort of 'sensitivity analysis.'

Note, in your example case I think the extremes would be the:

PNA category has 100 males that were selected and 200 females that were rejected. That would give you a 43% acceptance for males and 28.5% acceptance for females.
PNA category has 100 females that were selected and 200 males that were rejected. That would give you a 30% acceptance for males and 35% acceptance for females.

Of course it's very unlikely that either of those worst case scenario extremes occurred, which is how you end up in the land of multiple imputation etc.

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

Obsidian | Level 7

Re: is appropriate to use error bars to account for missing data when you have the full population

Posted 01-29-2022 03:16 PM (898 views) | In reply to Quentin

thank you! I'm a bit unclear about how you came up with the extremes - could you please explain?

Super User

Re: is appropriate to use error bars to account for missing data when you have the full population

Posted 01-30-2022 12:15 PM (846 views) | In reply to mjkop56

@mjkop56 wrote:

thank you! I'm a bit unclear about how you came up with the extremes - could you please explain?

Hi,

Your original data is:

	Male	Female	PNA
# Applied	900	1800	300
# Selected	330	570	100
Success Rate	37%	32%	33%

The extremes are not "all the PNA are male or al the PNA are female." The extremes are (I think) that gender is perfectly associated with selection among the PNA. That would (I think) give you the most extreme estimates you could get for the acceptance rates of males and females.

So if all all males among the PNA are selected and no females among the PNA are selected, your table becomes:

	Male	Female	PNA
# Applied	900+100=1000	1800+200=2000
# Selected	330+100=430	570
Success Rate	43%	28.5%

The other extreme is all females among the PNA are selected and no males among the PNA are accepted:

	Male	Female	PNA
# Applied	900+200=1100	1800+100=1900
# Selected	330	570+100=670
Success Rate	30%	35%

Obsidian | Level 7

Re: is appropriate to use error bars to account for missing data when you have the full population

Posted 02-05-2022 06:18 PM (780 views) | In reply to Quentin

thanks for the explanation!

Super User

Re: is appropriate to use error bars to account for missing data when you have the full population

Posted 01-24-2022 05:53 PM (1059 views) | In reply to mjkop56

Prefer not to say is not missing, it's a category on it's own. And from my somewhat biased experience, it's most likely to be women/non-binary who fill in that choice primarily. However, having that option and then attempting imputation seems problematic. If you want to analyze by subpopulation then things get a bit different but I wouldn't be imputing or combining it.

Obsidian | Level 7

Re: is appropriate to use error bars to account for missing data when you have the full population

Posted 01-24-2022 07:29 PM (1027 views) | In reply to Reeza

I agree imputation is problematic, and who is in the PNA is unknown (I think it could go either way).

Super User

Re: is appropriate to use error bars to account for missing data when you have the full population

Posted 01-24-2022 08:18 PM (1018 views) | In reply to mjkop56

As long as the sample size is large enough I wouldn't recommend consolidating these categories. PNA could be non-binary which would not fit in either group.

Obsidian | Level 7

Re: is appropriate to use error bars to account for missing data when you have the full population

Posted 01-29-2022 03:18 PM (894 views) | In reply to Reeza

"Prefer not to say is not missing, it's a category on it's own." => good point, as someone actually chose this option rather than skipping the question. But I'm wondering what the practical difference is between those two situations? In the case of "prefer not to answer", we don't know where they fit, so it seems to be the same thing as missing.

Super User

Re: is appropriate to use error bars to account for missing data when you have the full population

Posted 02-01-2022 11:09 AM (807 views) | In reply to mjkop56

If you're going to assume a gender after someone has said they don't want to specify one, what was the point of including that option?

Obsidian | Level 7

Re: is appropriate to use error bars to account for missing data when you have the full population

Posted 02-05-2022 06:21 PM (777 views) | In reply to Reeza

The point in including it is that providing gender needs to be optional, it can't be required. However, if there is a sizeable portion that did not want to answer, it affects the conclusions you can make.

Discussion stats

16 replies
‎01-24-2022 05:07 PM
1074 views
7 likes
5 in conversation