BookmarkSubscribeRSS Feed
sara_a
Calcite | Level 5

Hi,

I have two separate datasets that I would like to compare. I concatenated the datasets in order to be able to do t-tests and chi-square tests on but I'm not sure how to split the new dataset into two groups. There is no special features for either group only different ID numbers for each observation.

4 REPLIES 4
Reeza
Super User

So what differentiates the data? The source data sets? If so use INDSNAME to identify the source when appending.

data want;

set data1 data2 indsname=source;

indata=source;

run;

sara_a
Calcite | Level 5

Hi Reeza,

So, basically there was a larger dataset initially, random samples were taken from that larger datasets. This random sample has 70 people. I want to compare features from these 70 people with features from the observations that weren't randomly selected (n=472) to assess representativeness. Does that make more sense?

Thanks.

Reeza
Super User

That's a standard comparison - sample is similar to 'population'.

Using the method above will work to identify and then you can use class variable for comparison.

data want;

set pop sample indsname=source;

datain=source;

run;

proc freq data=want;

table datain*<variable of interest>/chisq;

run;

ballardw
Super User

One would hope that the original datasets, or source files to recreate the data sets, still exist. If the original data sets before concatenation no longer exist it may be that re-reading the source data files would be the best option.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1607 views
  • 0 likes
  • 3 in conversation