Hello SAS community, I have a categorical dependent variable (Y) and a categorical independent variable (X) what is the best way to determine if there is an association between X and Y?
Sample data would look like this
data have;
infile datalines;
input subject $ X Y;
datalines;
1 1 0
2 1 1
3 0 0
4 1 0
5 0 1
6 1 0
7 0 0
8 0 0
9 1 0
10 0 0
11 1 0
12 0 0
13 0 0
14 1 1
15 1 0
16 0 0
17 0 0
18 0 0
19 1 0
20 1 0
;
run;
I would like to know if X is 1 what is the change in likelihood Y will be 1? The null hypothesis would be there is no impact on Y. What is the best way to prove or disprove?
In one sentence you talk of "association" and in another sentence you talk about a change in X changes the likelihood of Y. These are not the same. PROC LOGISTIC will predict the probability that Y is 1, given a value for X. Association would most likely be done via the chisquared test in PROC FREQ
Run proc FREQ with the MEASURES option in the TABLES statement.
Check the details section of online doc for a description of the statistics. In particular,
Thanks @PaigeMiller and @PGStats for the direction. I spent some time getting familiar with the chisquare test for independence and the odds ratio. I think both could be relevant for my sceanatio.
My understanding is the chisquare test for independence can reject (or not) my null hypothesis with a certain level of confidence. Whereas the odd ration can quantify the strength of the relationship between X and Y. I will try running both tests on my data.
My understanding is the chisquare test for independence can reject (or not) my null hypothesis
But you haven't specified a null hypothesis. As I said, if the null hypothesis is about association, you use ChiSquared. If the null hypothesis is about X predicting Y, you use logistic regression. These two are not the same. And your original message isn't completely clear on which you are talking about, it seems you are talking about both interchangeably.
@PaigeMiller , You are right in that I have not clearly stated the null hypothesis. So lets assume my null hypothesis is there is no relationship between X any Y with 95% confidence. I get the following results:
Observed 
Y 



0 
1 
Total 

X 
0 
1,360,524 
1,073,073 
2,433,597 
1 
194 
7 
201 


Total 
1,360,718 
1,073,080 
2,433,798 
Expected 

Y 



0 
1 


X 
0 
1,360,606 
1,072,991 
2,433,597 
1 
112 
89 
201 


1,360,718 
1,073,080 
2,433,798 
I interpret this to mean I can reject the null hypothesis, meaning there is a relationship between X an Y. Further the odds ratio suggest a negative correlation between X an Y (this is a surprising result). If a subject is in program X then an outcome of Y (Y=1) is less likely.
Here are some more measures:
Statistic 
Value 
ASE 
Gamma 
0.9125 
0.0322 
Kendall's Taub 
0.0074 
0.0004 
Stuart's Tauc 
0.0001 
0.0000 
Somers' D CR 
0.4061 
0.0129 
Somers' D RC 
0.0001 
0.0000 
Pearson Correlation 
0.0074 
0.0004 
Spearman Correlation 
0.0074 
0.0004 
Lambda Asymmetric CR 
0.0000 
0.0000 
Lambda Asymmetric RC 
0.0000 
0.0000 
Lambda Symmetric 
0.0000 
0.0000 
Uncertainty Coefficient CR 
0.0001 
0.0000 
Uncertainty Coefficient RC 
0.0422 
0.0038 
Uncertainty Coefficient Symmetric 
0.0001 
0.0000 
@supp wrote:
@PaigeMiller , You are right in that I have not clearly stated the null hypothesis. So lets assume my null hypothesis is there is no relationship between X any Y with 95% confidence. I get the following results:
Are you talking about "association" or "predictability" when you say "relationship"? These terms have clear statistical meaning, and are different, but perhaps you are not aware of the difference.
Predictability uses the value of X to predict Y. Association just wants to test whether X and Y tend to move together or not.
SAS Innovate 2025 is scheduled for May 69 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.