BookmarkSubscribeRSS Feed
klongway
Calcite | Level 5

Hi,

 

I have a dataset that contains the sample (2,500 participants), in which I have a subsample that I'm using for analysis (1000 participants). I am trying to see if there are statistically significant differences between the whole sample and my subsample. I am using T tests for my interval variables, but have a lot of categorical variables, so I am trying to use Chi square to see if there are significant differences.

 

To differentiate between the whole sample and the subsample I have a new variable called "sample" and if the value in the "sample" column is 1 then that participant is part of the large sample but not the subsample, and if the value in the "sample" column is 2 then the participant is part of the subsample. 

 

For t tests I used the class statement saying "class sample; var age" which seems to have worked.

 

Is there a similar way to do this for Chi Square?  I want to compare employment (employed or not) for sample 1 and sample 2. 

 

Thanks!

 

 

5 REPLIES 5
FreelanceReinh
Jade | Level 19

Hi @klongway,

 

So you always compare the 1000 observations with sample=2 to the remaining 1500 with sample=1. The example below shows how to do this comparison for a categorical variable:

/* Create test data for demonstration */

data have;
set sashelp.heart(obs=2500 rename=(status=employment));
sample=1+(_n_<=1000);
run;

/* Perform chi-square test */

proc freq data=have;
tables sample*employment / chisq;
run;
klongway
Calcite | Level 5

This is so helpful!!! Thank you!!! I'm running into problems with some of my variables when doing this- some are working, others are not. 

 

I have 

data file2;

set file1;

if dep=. then sample=2;

if dep=1 then sample=1;

if dep=2 then sample=1;

run;

I then run a table

Proc freq data=file2;

tables sample;

run;

And the table shows all of my 2500 samples into sample 1 or sample 2.

When I try to run the chi square for employment, though, sample 2 comes up blank.

 

I did:

proc freq data=file2

tables sample*emp/chisq;

run;

 

And the table comes up with sample 2 empty.

 

I checked the file 2 and I have plenty of people who answered the employment question in sample 2, so it isn't that there isn't any data.

 

Any ideas?

 

Thanks!!!

FreelanceReinh
Jade | Level 19

To get a quick overview of several categorical variables I often use PROC FREQ with the MISSING and LIST options in the TABLES statement.

So I would run

proc freq data=file2;
tables dep*sample*emp / missing list;
run;

and examine the resulting output. What does it look like for your file2?

klongway
Calcite | Level 5

Thank you so much! When I run that I get: a chart with 

Sample 2-full time-400

Sample 2-parttime-400

Sample 2- not working-700

Sample 1-missing=1500

 

So It is pulling all of sample 2 but only has the "missing" in sample 1. But the "missing" in sample 2 adds up to the total number in the dataset in sample 2....!!! Is it possible everyone in Sample 1 did not answer this question?! Ahh!

 

FreelanceReinh
Jade | Level 19

@klongway wrote:

... When I run that I get: a chart with 

Sample 2-full time-400

Sample 2-parttime-400

Sample 2- not working-700

Sample 1-missing=1500


Assuming that "Sample 2" in the above PROC FREQ output refers to your "subsample" consisting of 1000 participants, I would be wondering why the corresponding frequencies, 400, 400 and 700, add up to 1500, not 1000.

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 976 views
  • 0 likes
  • 2 in conversation