Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- Programming
- /
- Weird Chi Square requests

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 06-07-2021 02:32 PM
(456 views)

Hi,

I have a dataset that contains the sample (2,500 participants), in which I have a subsample that I'm using for analysis (1000 participants). I am trying to see if there are statistically significant differences between the whole sample and my subsample. I am using T tests for my interval variables, but have a lot of categorical variables, so I am trying to use Chi square to see if there are significant differences.

To differentiate between the whole sample and the subsample I have a new variable called "sample" and if the value in the "sample" column is 1 then that participant is part of the large sample but not the subsample, and if the value in the "sample" column is 2 then the participant is part of the subsample.

For t tests I used the class statement saying "class sample; var age" which seems to have worked.

Is there a similar way to do this for Chi Square? I want to compare employment (employed or not) for sample 1 and sample 2.

Thanks!

5 REPLIES 5

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi @klongway,

So you always compare the 1000 observations with sample=2 to the remaining 1500 with sample=1. The example below shows how to do this comparison for a categorical variable:

```
/* Create test data for demonstration */
data have;
set sashelp.heart(obs=2500 rename=(status=employment));
sample=1+(_n_<=1000);
run;
/* Perform chi-square test */
proc freq data=have;
tables sample*employment / chisq;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

This is so helpful!!! Thank you!!! I'm running into problems with some of my variables when doing this- some are working, others are not.

I have

data file2;

set file1;

if dep=. then sample=2;

if dep=1 then sample=1;

if dep=2 then sample=1;

run;

I then run a table

Proc freq data=file2;

tables sample;

run;

And the table shows all of my 2500 samples into sample 1 or sample 2.

When I try to run the chi square for employment, though, sample 2 comes up blank.

I did:

proc freq data=file2

tables sample*emp/chisq;

run;

And the table comes up with sample 2 empty.

I checked the file 2 and I have plenty of people who answered the employment question in sample 2, so it isn't that there isn't any data.

Any ideas?

Thanks!!!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

To get a quick overview of several categorical variables I often use PROC FREQ with the MISSING and LIST options in the TABLES statement.

So I would run

```
proc freq data=file2;
tables dep*sample*emp / missing list;
run;
```

and examine the resulting output. What does it look like for your file2?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you so much! When I run that I get: a chart with

Sample 2-full time-400

Sample 2-parttime-400

Sample 2- not working-700

Sample 1-missing=1500

So It is pulling all of sample 2 but only has the "missing" in sample 1. But the "missing" in sample 2 adds up to the total number in the dataset in sample 2....!!! Is it possible everyone in Sample 1 did not answer this question?! Ahh!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@klongway wrote:

... When I run that I get: a chart with

Sample 2-full time-400

Sample 2-parttime-400

Sample 2- not working-700

Sample 1-missing=1500

Assuming that "Sample 2" in the above PROC FREQ output refers to your "subsample" consisting of 1000 participants, I would be wondering why the corresponding frequencies, 400, 400 and 700, add up to 1500, not 1000.

**Available on demand!**

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.