BookmarkSubscribeRSS Feed
CatPaws
Calcite | Level 5
I am needing to do a Chi Square test of Homogeneity. I know how to do the Chi Square test of Independence, my issue is I have 2 datasets. One dataset for one population and another dataset (same variables) for the second population. Do I need to have these in the same dataset to do the test?
7 REPLIES 7
Rick_SAS
SAS Super FREQ

Use the population data to obtain the proportions of the population in each category. For example, your population might have 10% Asian, 40% black, and 50% white.

 

Then use those numbers on the TESTP= option on the TABLES statement in PROC FREQ:

proc sort data=Have out=Sample;
by Race;   /* assume values are Asian, Black, and White */
run;

proc freq data = Sample order=data;
   tables Race / nocum chisq
   /* If population proportions are for Asian, Black, and White */
                  testp=(0.1  0.40  0.5);
run;
CatPaws
Calcite | Level 5

Hi Rick,

 I think I understand what you are saying, but when I look at that document, it shows two Chi-Square test,( one for each region). I am only wanting to do one Chi-Square test, based off two different groups. I have attached a document that shows the frequency of sex, Females and Males,in Case group, as well as the frequency of sex3, Females and Males, in Case3 group. Since they are in two different data sets, how would I code this? 

 

*I will exclude the Unknown values before analysis, but I am more interested in how to set up the code right now

SAS_Rob
SAS Employee

The two groups will need to be in the same data set, with an indicator variable that defines which group the observation belong to.  That group variable would then appear on the TABLES statement.

So for example, if you wanted to test if the distribution of race across the two groups was the same, then you would have:

tables group*race/chisq;

 

 

CatPaws
Calcite | Level 5
Now we are getting somewhere lol. Since my oberservation IDs are identified by numbers, it would be difficult to try to create a dummy variable for them since they follow no specific order, rhyme, or reason. However, what separates my two populations is a date range. My next question is how do you create an indicator variable using a date range? For example, group 1 is defined from Nov 1999-Jan 2000, and group two is defined as Feb 2000-April 2000. How would that look coded?
SteveDenham
Jade | Level 19

What is the response variable here?  I must have missed that   Anyhow, if there is a date on every record, you can code in a flag based on the cutoff date.  I think you can ignore observation ID in the PROC FREQ analysis.  If you become interested in a more complex/complete approach, you might be able to use PROC GENMOD with a GEE model to test for the time segment equality/homogeneity.

 

SteveDenham

CatPaws
Calcite | Level 5

Hi Steve,

 Thank you. Do you have any references or suggestions on how I code it based on the cutoff date?

SteveDenham
Jade | Level 19

There are probably 50 threads on how to do that, especially in the Programming forum.  Search and you will find.

 

SteveDenham

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1906 views
  • 0 likes
  • 4 in conversation