03-25-2015 08:46 PM
Hi! I'm a bit far removed from the chi square test and controlling for a variable (which I once must've learned way back when?).
I'm trying to see if facility type is a confounding factor in a chi-square analysis where I use:
proc freq data=keep
I'm was initially trying to see if facilities within a certain category were more likely to have kids missing doses.
So I used the above code but without the 'type*'. And I got a significant chi-sq value (p<.0001)
But, I wanted to make sure that the facility type wasn't causing the significant outcome.
So, I added the 'type*' in the above code and when I look at both facility types, the chisq value isn't significant.
Does that mean there really is no significant difference and that type is a confounder?
I'm so far removed from this that I just don't remember what it means to control for a variable ( I only remember if you don't control for a confounding variable, it can create a false association)
And what does it mean that you get a significant result looking at all facility types.
But, when you control for facility types, the significant results go away?
Any help you can give is greatly appreciated!
03-26-2015 09:16 AM
You'd use cmh option when you are process a two more dimensions contingency table .
proc freq data=keep
tables type*category*missing_doses/chisq cmh ;
"So, I added the 'type*' in the above code and when I look at both facility types, the chisq value isn't significant.
Does that mean there really is no significant difference and that type is a confounder?"
I am afraid so . If you add one more variable , the result could be totally different .
You'd better check FREQ documention , especially its examples , there is one almost like yours . (about student sex enrollment trainee )
03-26-2015 11:37 AM
Appreciate your help. I only found under sas documentation examples an example about hair color. Where is the example you're referring to.
I guess I cannot control for a variable using chisq but I can using cmh?
When I controlled for type using chi sq (the above code without cmh) it wasn't significant. But when I used cmh, p<.0001.Thank you!
03-27-2015 09:08 AM
At the start of proc freq documentation , there is an example like yours . You can find here.
The FREQ procedure provides easy access to statistics for testing for association in a crosstabulation
In this example, high school students applied for courses in a summer enrichment program; these
courses included journalism, art history, statistics, graphic arts, and computer programming. The
students accepted were randomly assigned to classes with and without internships in local companies.
Table 3.1 contains counts of the students who enrolled in the summer program by gender and whether
they were assigned an internship slot.
Table 3.1 Summer Enrichment Data
Gender Internship Yes No Total
boys yes 35 29 64
boys no 14 27 41
girls yes 32 10 42
girls no 53 23 76
The SAS data set SummerSchool is created by inputting the summer enrichment data as cell count
data, or providing the frequency count for each combination of variable values. The following DATA
step statements create the SAS data set SummerSchool:
input Gender $ Internship $ Enrollment $ Count @@;
boys yes yes 35 boys yes no 29
boys no yes 14 boys no no 27
girls yes yes 32 girls yes no 10
girls no yes 53 girls no no 23
The variableGender takes the values ‘boys’ or ‘girls,’ the variableInternship takes the values ‘yes’
and ‘no,’ and the variableEnrollment takes the values ‘yes’ and ‘no.’ The variableCount contains the
number of students that correspond to each combination of data values. The double at sign (@@)
indicates that more than one observation is included on a single data line. In this DATA step, two
observations are included on each line.
Researchers are interested in whether there is an association between internship status and summer
program enrollment. The Pearson chi-square statistic is an appropriate statistic to assess the
association in the corresponding 2 2 table. The following PROC FREQ statements specify this