Hi,
I have two separate datasets that I would like to compare. I concatenated the datasets in order to be able to do t-tests and chi-square tests on but I'm not sure how to split the new dataset into two groups. There is no special features for either group only different ID numbers for each observation.
So what differentiates the data? The source data sets? If so use INDSNAME to identify the source when appending.
data want;
set data1 data2 indsname=source;
indata=source;
run;
Hi Reeza,
So, basically there was a larger dataset initially, random samples were taken from that larger datasets. This random sample has 70 people. I want to compare features from these 70 people with features from the observations that weren't randomly selected (n=472) to assess representativeness. Does that make more sense?
Thanks.
That's a standard comparison - sample is similar to 'population'.
Using the method above will work to identify and then you can use class variable for comparison.
data want;
set pop sample indsname=source;
datain=source;
run;
proc freq data=want;
table datain*<variable of interest>/chisq;
run;
One would hope that the original datasets, or source files to recreate the data sets, still exist. If the original data sets before concatenation no longer exist it may be that re-reading the source data files would be the best option.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.