BookmarkSubscribeRSS Feed
Kelly_K
Fluorite | Level 6

Hi, SAS Community.

 

I'm trying to perform a chi-square test using Proc SurveyFreq when one of my cells has a zero.  This issue has been raised and answered before in the following post:  https://communities.sas.com/t5/Statistical-Procedures/Trying-to-perform-a-chi-squared-test-using-Sur... 

 

As a new SAS user, I am stumped as to how to implement that solution.  Can anyone provide more detailed instructions for implementing that solution for a newbie such as myself? Here is the solution that was provided:  

 

"Construct a dataset that has one observation for each missing table cell, and assign a small weight (relative to the real survey weights) to each of these observations. Merge that dataset with the analysis dataset. Due to the small weights, these additional observations should not affect the estimates of proportions and totals -- but will give the formerly missing cells one observation each, and standard errors of zero, and SurveyFreq will compute the chi-squares."

 

Many thanks.

 

Kelly

 

1 REPLY 1
ballardw
Super User

First reiterating: chi-square with a zero count cell really is inappropriate.

Typically my first step when this occurs is to consider the data. Is one (or more) of the values of the variables with zero counts possibly because of a relatively large number of values compared to the number of observations. Consider a value such as a persons age. If you only have 50 records in your data and the age range of values is 20 to 80 some of the cells must have a 0 count. If you have only 150 records it is likely that some of the cells have very small or zero counts. If this is what goes on with your data, note that you have provided no description of your actual data, then it might mean that grouping the data into 5-year or 10-year age groups would be an appropriate approach. You would not report on individual ages but on the age age groups.

One extremely nice thing about this is that you need to do nothing to your data. Create a custom format to create the groups and use the format in the procedure code with the variable(s) involved.

 

With that in mind let's deconstruct the instructions:

"Construct a dataset that has one observation for each missing table cell, and assign a small weight". This is a list of variable values that are generating 0 counts. These would typically be pairs of variable values in your table. Basically write a data step to use those variables and assign the problem values. Assuming that the variables are named X and Y an example might look like:

data tomerge;
   input x y weight;
datalines;
1   27   1E-10
2   18   1E-10
;

The variable names, lengths and types need to match YOUR data. If you have have other design variables such as strata or cluster variables you will need to include them as well in the data set. For stratified / clustered data you would need to make sure that you have each strata/cluster represented. Any other domain variables used need to be in this set as well if you expect to do such analysis so that the cells are not empty for any of the domains. Which is why the message you quote provided description not code. Note the original post included a BY variable, which is generally a bad idea, it should have been on the Tables statement.

 

 

Note the 1E-10 is scientific notation and is read as number as such as one example of how to provide "very small value of weight" without typing lots of zeroes.

 

Combine the data (merge shouldn't be needed);

data forchisquare;
   set yourdatasetnamegoeshere
        tomerge
   ;
run;

Use this set for ONLY the chi-square.

 

Large economy sized hint: Always include your code attempted so we have some concrete things such as data set and variable names at least. Copy the code and on the forum open a text box using the </> icon above the message window and paste the code or log. The main windows on this forum reformat text and can make things hard to read, remove characters or replace them with non-valid for code elements.

 

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1125 views
  • 0 likes
  • 2 in conversation