BookmarkSubscribeRSS Feed
lj653
Fluorite | Level 6

Hi,

 

I am currently trying to figure out how to run two different Z-Test's on data set that is split into 4 different sets based on which of the 4 disease they were diagnosed with. I am trying to run Z-tests comparing gener and ethnicity. However, I can't seem to find any sample code on how to run Z-Tests on character based variables like ethnicity. My data set for ethnicity is split into 29 different catagories, ranging from African American to Caucasian to Mixed (Asian and Caucasian, or Caucasian Latino, etc), and split into the two genders male and female.

 

I'm not familiar with the concept of running Z-Tests, and I was wondering, how would you run a Z-Test on the dataset given? If for example, I wanted to run a Z-Test on gender, should I just run one big Z-Test on the entire dataset comparing female to male numbers (I'm not sure if it would be numbers or proportions) and disregard the four different diagnoses, or should I run multiple Z-Tests comparing the four subcategories to each other? Would the rules that apply for gender also apply for ethnicity? 

 

Also, would you please include a sample code or template on how to run Z-Tests?

 

Thanks!

8 REPLIES 8
Reeza
Super User

1. You don't run z tests on categorical data.

2. You rarely run z-tests, you usually use t-tests because your data is a sample and represents a population, unless your situation includes the full population. This is incredibly rare.

3. For comparing distributions of categorical variables use Chi Square tests.

 

http://support.sas.com/training/tutorial/

 

Your specific topic sounds like 'Table Analysis'

http://support.sas.com/training/tutorial/studio/table-analysis.html

 

If you're coding, you can use PROC FREQ

http://support.sas.com/documentation/cdl/en/procstat/66703/HTML/default/viewer.htm#procstat_freq_syn...

The examples are good so look through them as well to see any that are close to your question.

ballardw
Super User

Running any statistical test should follow the formulation of a question. The question(s) will give clues as to the likely types of tests are appropriate.

 

Z-tests (and t-tests) generally answer questions related to differences or similarity of the mean of a continuous valued variable between 2 groups of values or a reference value. Examples: Is the mean height (the continuous variable) different for men and women. Is the mean content of our soda cans at least the labeled 12 ounces.

 

So, what are the questions you have been asked to answer about your data?

lj653
Fluorite | Level 6

I was asked to make sure that the ratio of men to women in each group was 50-50, and that the data was equally spread between the different ethnicities. I want to verify that each of my four groups have an equal distribution of men and women, and an equal distribution among the ethnicities.

Reeza
Super User

Then see the One Way Analysis in the link I sent above with the videos/tutorials.

lj653
Fluorite | Level 6

Are you referring to the very first link you attached in the message?

Reeza
Super User

@lj653 wrote:

Are you referring to the very first link you attached in the message?


Yes.

http://support.sas.com/training/tutorial/studio/frequency-analysis.html

 

Here's another good reference:

http://www.ats.ucla.edu/stat/sas/whatstat/whatstat.htm

 

 

I'm not a huge fan of doing statistical analysis blind, so I highly recommend you spend some time reading up on the theory of what you're trying to do and why. In the first link there's a link to a SAS e-course (free) that covers doing statistical analysis with SAS, that includes both some theory and the programming components in one. If you're going to be using SAS and doing analysis I highly recommend the course. Any time spent taking the course will be saved googling later on. And avoiding costly mistakes. 

 

 

ballardw
Super User

@lj653 wrote:

I was asked to make sure that the ratio of men to women in each group was 50-50, and that the data was equally spread between the different ethnicities. I want to verify that each of my four groups have an equal distribution of men and women, and an equal distribution among the ethnicities.


If your data set has a single variable for sex with two levels and a single variable for race then  something like:

 

proc freq data=have;

   by group; /*requires the data is sorted by group before proc freq*/

   tables race*sex;

run;

 

Will generate a table for each "group" that you can eyeball for percentages of gender within race.

You can add options for chi-square tests if you need better than an eyeball estimate of similarity of distributions.

NOTE: if you number of records within a group is small, and when I am considering Race with 5 levels typically, then small is going to be less than 200 or so, you are likely to have problems with reliable results from a chi-square as American Indian and Native Hawiian/Other Pacific Islanders may not be well represented. In some parts of the country Asian or Black might have small cell counts as well.

Reeza
Super User

Shouldn't it be:?

table group * (sex race)/chisq;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 2251 views
  • 3 likes
  • 3 in conversation