BookmarkSubscribeRSS Feed
supp
Pyrite | Level 9

Hello SAS community, I have a categorical dependent variable (Y) and a categorical independent variable (X) what is the best way to determine if there is an association between X and Y?

 

Sample data would look like this

data have;
 infile datalines;
  input subject $ X Y;
datalines;
1 1 0
2 1 1
3 0 0
4 1 0
5 0 1
6 1 0
7 0 0
8 0 0
9 1 0
10 0 0
11 1 0
12 0 0
13 0 0
14 1 1
15 1 0
16 0 0
17 0 0
18 0 0
19 1 0
20 1 0
;
run;

I would like to know if X is 1 what is the change in likelihood Y will be 1? The null hypothesis would be there is no impact on Y. What is the best way to prove or disprove?

 

7 REPLIES 7
PaigeMiller
Diamond | Level 26

In one sentence you talk of "association" and in another sentence you talk about a change in X changes the likelihood of Y. These are not the same. PROC LOGISTIC will predict the probability that Y is 1, given a value for X. Association would most likely  be done via the chi-squared test in PROC FREQ

--
Paige Miller
PGStats
Opal | Level 21

Run proc FREQ with the MEASURES option in the TABLES statement.

 

Check the details section of online doc for a description of the statistics. In particular,

 

https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_freq_details68.htm&docsetVersion=... 

PG
supp
Pyrite | Level 9

Thanks @PaigeMiller and @PGStats  for the direction. I spent some time getting familiar with the chi-square test for independence and the odds ratio. I think both could be relevant for my sceanatio.

 

My understanding is the chi-square test for independence can reject (or not) my null hypothesis with a certain level of confidence. Whereas the odd ration can quantify the strength of the relationship between X and Y. I will try running both tests on my data.

PaigeMiller
Diamond | Level 26

My understanding is the chi-square test for independence can reject (or not) my null hypothesis

 

But you haven't specified a null hypothesis. As I said, if the null hypothesis is about association, you use Chi-Squared. If the null hypothesis is about X predicting Y, you use logistic regression. These two are not the same. And your original message isn't completely clear on which you are talking about, it seems you are talking about both interchangeably.

--
Paige Miller
supp
Pyrite | Level 9

@PaigeMiller , You are right in that I have not clearly stated the null hypothesis. So lets assume my null hypothesis is there is no relationship between X any Y with 95% confidence. I get the following results:

 

 

Observed

Y

 

 

 

0

1

Total

X

0

1,360,524

1,073,073

 2,433,597

1

 194

7

201

 

Total

1,360,718

1,073,080

2,433,798

         

Expected

 

Y

 

 

 

0

1

 

X

0

1,360,606

1,072,991

2,433,597

1

112

89

201

 

 

1,360,718

1,073,080

2,433,798

 

  • Chi-Square = 134.47 (P < .0001)
  • Odds Ratio = 0.0457 (0.0215 - 0.0972)

 

I interpret this to mean I can reject the null hypothesis, meaning there is a relationship between X an Y. Further the odds ratio suggest a negative correlation between X an Y (this is a surprising result). If a subject is in program X then an outcome of Y (Y=1) is less likely. 

 

Here are some more measures:

Statistic

Value

ASE

Gamma

-0.9125

0.0322

Kendall's Tau-b

-0.0074

0.0004

Stuart's Tau-c

-0.0001

0.0000

Somers' D C|R

-0.4061

0.0129

Somers' D R|C

-0.0001

0.0000

Pearson Correlation

-0.0074

0.0004

Spearman Correlation

-0.0074

0.0004

Lambda Asymmetric C|R

0.0000

0.0000

Lambda Asymmetric R|C

0.0000

0.0000

Lambda Symmetric

0.0000

0.0000

Uncertainty Coefficient C|R

0.0001

0.0000

Uncertainty Coefficient R|C

0.0422

0.0038

Uncertainty Coefficient Symmetric

0.0001

0.0000

PaigeMiller
Diamond | Level 26

@supp wrote:

@PaigeMiller , You are right in that I have not clearly stated the null hypothesis. So lets assume my null hypothesis is there is no relationship between X any Y with 95% confidence. I get the following results:


Are you talking about "association" or "predictability" when you say "relationship"? These terms have clear statistical meaning, and are different, but perhaps you are not aware of the difference.

 

Predictability uses the value of X to predict Y. Association just wants to test whether X and Y tend to move together or not.

--
Paige Miller
supp
Pyrite | Level 9
Right now I am just trying to understand if X an Y are associated. Or to state it another way ( I think) are X and Y independent of each other. Which it looks like they are not.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 708 views
  • 2 likes
  • 3 in conversation