BookmarkSubscribeRSS Feed
Novice_user
Calcite | Level 5

I have a dataset (say N=200) of which a subset (N=150) have undergone a certain assay.  I want to be able to say that the subgroup N=150 is not significantly different from the entire population (N=200).  Is there a way to compare the two eg in terms of baseline characteristics.  I tried comparing the N= 150 who have the assay with N=50 who do not have the assay (eg with ttests and chi square tests).  However, for one variable (blood pressure), the difference between the N=150 and N=50 group was significant, however on visual inspection the value for the N=150 did not look very different from the N=200.  Is there a way to compare the N=150 subgroup to the N=200 dataset rather than compare the N=150 and N=50 subgroups of the N=200.  I hope that makes sense.

2 REPLIES 2
Modeller
Fluorite | Level 6

Comparing the full population (N=200) with the subset population (N=150) would be misleading as N=150 is a subset of N=200 i.e. N=200 includes these N=150 data points (75% of the full population). It would not make logical sense to compare a population with 75% of itself as the comparison is mostly between the same data points. For this reason it would be expected that these comparisons would look similar and statistical significance between the N=200 and N=150 would be meaningless.

Where you said that "for one variable (blood pressure), the difference between the N=150 and N=50 group was significant, however on visual inspection the value for the N=150 did not look very different from the N=200", it makes sense that N=150 & N=200 are visually similar as the N=200 is mostly (75%) comprised of data points from N=150 and one is merely comparing the same thing with itself.

It would be most significant to compare subsets comprising independent data points of each other.

Novice_user
Calcite | Level 5

Thanks Modeller, that makes sense.  I guess what I'd like to be able to say is that conclusions from testing the N=150 population can be generalised to the N=200 population (because unfortunately not everyone in the N=200 had the assay).  Is that still possible, even if the N=50 is significantly different from the N=150 in some ways?

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1463 views
  • 0 likes
  • 2 in conversation