Help using Base SAS procedures

Controlling for a variable using ChiSq

Reply
Regular Contributor
Posts: 194

Controlling for a variable using ChiSq

    Hi!  I'm a bit far removed from the chi square test and controlling for a variable (which I once must've learned way back when?).

I'm trying to see if facility type is a confounding factor in a chi-square analysis where I use:

proc freq data=keep

tables type*category*missing_doses/chisq;

run;

I'm was initially trying to see if facilities within a certain category were more likely to have kids missing doses.

So I used the above code but without the 'type*'.  And I got a significant chi-sq value (p<.0001)

But,  I wanted to make sure that the facility type wasn't causing the significant outcome.

So, I added the 'type*' in the above code and when I look at both facility types, the chisq value isn't significant.

Does that mean there really is no significant difference and that type is a confounder?

I'm so far removed from this that I just don't remember what it means to control for a variable ( I only remember if you don't control for a confounding variable, it can create a false association)

And what does it mean that you get a significant result looking at all facility types.

But, when you control for facility types, the significant results go away?

Any help you can give is greatly appreciated!

Super User
Posts: 10,048

Re: Controlling for a variable using ChiSq

You'd use cmh option when you are process a two more dimensions contingency table .

proc freq data=keep

tables type*category*missing_doses/chisq cmh ;

run;

"So, I added the 'type*' in the above code and when I look at both facility types, the chisq value isn't significant.

Does that mean there really is no significant difference and that type is a confounder?"

I am afraid so . If you add one more variable , the result could be totally different .

You'd better check FREQ documention , especially its examples , there is one almost like yours . (about student sex enrollment trainee )

Xia Keshan

Regular Contributor
Posts: 194

Re: Controlling for a variable using ChiSq

Appreciate your help. I only found under sas documentation examples an example about hair color.  Where is the example you're referring to.

I guess I cannot control for a variable using chisq but I can using cmh?

When I controlled for type using chi sq (the above code without cmh) it wasn't significant. But when I used cmh, p<.0001.Thank you!

Super User
Posts: 10,048

Re: Controlling for a variable using ChiSq

Hi.

At the start of proc freq documentation , there is an example like yours . You can find here.

The FREQ procedure provides easy access to statistics for testing for association in a crosstabulation

table.

In this example, high school students applied for courses in a summer enrichment program; these

courses included journalism, art history, statistics, graphic arts, and computer programming. The

students accepted were randomly assigned to classes with and without internships in local companies.

Table 3.1 contains counts of the students who enrolled in the summer program by gender and whether

they were assigned an internship slot.

Table 3.1 Summer Enrichment Data

Enrollment

Gender Internship Yes No Total

boys yes 35 29 64

boys no 14 27 41

girls yes 32 10 42

girls no 53 23 76

The SAS data set SummerSchool is created by inputting the summer enrichment data as cell count

data, or providing the frequency count for each combination of variable values. The following DATA

step statements create the SAS data set SummerSchool:

data SummerSchool;

input Gender $ Internship $ Enrollment $ Count @@;

datalines;

boys yes yes 35 boys yes no 29

boys no yes 14 boys no no 27

girls yes yes 32 girls yes no 10

girls no yes 53 girls no no 23

;

The variableGender takes the values ‘boys’ or ‘girls,’ the variableInternship takes the values ‘yes’

and ‘no,’ and the variableEnrollment takes the values ‘yes’ and ‘no.’ The variableCount contains the

number of students that correspond to each combination of data values. The double at sign (@@)

indicates that more than one observation is included on a single data line. In this DATA step, two

observations are included on each line.

Researchers are interested in whether there is an association between internship status and summer

program enrollment. The Pearson chi-square statistic is an appropriate statistic to assess the

association in the corresponding 2 2 table. The following PROC FREQ statements specify this

analysis.

Regular Contributor
Posts: 194

Re: Controlling for a variable using ChiSq

Thank you!!

Ask a Question
Discussion stats
  • 4 replies
  • 681 views
  • 0 likes
  • 2 in conversation