BookmarkSubscribeRSS Feed
jcis7
Pyrite | Level 9

    Hi!  I'm a bit far removed from the chi square test and controlling for a variable (which I once must've learned way back when?).

I'm trying to see if facility type is a confounding factor in a chi-square analysis where I use:

proc freq data=keep

tables type*category*missing_doses/chisq;

run;

I'm was initially trying to see if facilities within a certain category were more likely to have kids missing doses.

So I used the above code but without the 'type*'.  And I got a significant chi-sq value (p<.0001)

But,  I wanted to make sure that the facility type wasn't causing the significant outcome.

So, I added the 'type*' in the above code and when I look at both facility types, the chisq value isn't significant.

Does that mean there really is no significant difference and that type is a confounder?

I'm so far removed from this that I just don't remember what it means to control for a variable ( I only remember if you don't control for a confounding variable, it can create a false association)

And what does it mean that you get a significant result looking at all facility types.

But, when you control for facility types, the significant results go away?

Any help you can give is greatly appreciated!

4 REPLIES 4
Ksharp
Super User

You'd use cmh option when you are process a two more dimensions contingency table .

proc freq data=keep

tables type*category*missing_doses/chisq cmh ;

run;

"So, I added the 'type*' in the above code and when I look at both facility types, the chisq value isn't significant.

Does that mean there really is no significant difference and that type is a confounder?"

I am afraid so . If you add one more variable , the result could be totally different .

You'd better check FREQ documention , especially its examples , there is one almost like yours . (about student sex enrollment trainee )

Xia Keshan

jcis7
Pyrite | Level 9

Appreciate your help. I only found under sas documentation examples an example about hair color.  Where is the example you're referring to.

I guess I cannot control for a variable using chisq but I can using cmh?

When I controlled for type using chi sq (the above code without cmh) it wasn't significant. But when I used cmh, p<.0001.Thank you!

Ksharp
Super User

Hi.

At the start of proc freq documentation , there is an example like yours . You can find here.

The FREQ procedure provides easy access to statistics for testing for association in a crosstabulation

table.

In this example, high school students applied for courses in a summer enrichment program; these

courses included journalism, art history, statistics, graphic arts, and computer programming. The

students accepted were randomly assigned to classes with and without internships in local companies.

Table 3.1 contains counts of the students who enrolled in the summer program by gender and whether

they were assigned an internship slot.

Table 3.1 Summer Enrichment Data

Enrollment

Gender Internship Yes No Total

boys yes 35 29 64

boys no 14 27 41

girls yes 32 10 42

girls no 53 23 76

The SAS data set SummerSchool is created by inputting the summer enrichment data as cell count

data, or providing the frequency count for each combination of variable values. The following DATA

step statements create the SAS data set SummerSchool:

data SummerSchool;

input Gender $ Internship $ Enrollment $ Count @@;

datalines;

boys yes yes 35 boys yes no 29

boys no yes 14 boys no no 27

girls yes yes 32 girls yes no 10

girls no yes 53 girls no no 23

;

The variableGender takes the values ‘boys’ or ‘girls,’ the variableInternship takes the values ‘yes’

and ‘no,’ and the variableEnrollment takes the values ‘yes’ and ‘no.’ The variableCount contains the

number of students that correspond to each combination of data values. The double at sign (@@)

indicates that more than one observation is included on a single data line. In this DATA step, two

observations are included on each line.

Researchers are interested in whether there is an association between internship status and summer

program enrollment. The Pearson chi-square statistic is an appropriate statistic to assess the

association in the corresponding 2 2 table. The following PROC FREQ statements specify this

analysis.

jcis7
Pyrite | Level 9

Thank you!!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 2415 views
  • 0 likes
  • 2 in conversation