BookmarkSubscribeRSS Feed
TomHsiung
Pyrite | Level 9

This is from Wikipedia. The joint probability of the match pair when Yi1 =1 and Yi2 = 0. Note the denominator, where the P(Yi1 =1 & Yi2 =0), and P(Yi1 =0 & Yi2 =1) are the product of their own probability, respectively. Therefore, my question is, are Yi1 and Yi2 independent or dependent? Looking forward to your opinion.

 

The joint probability of a pair:

Screenshot 2025-06-22 at 12.28.41 PM.png

3 REPLIES 3
StatDave
SAS Super FREQ

If observations are matched then they are not independent. Study data can consist of matched sets of any size, not just pairs, with any number of events observed within each set if the response is binary. There are various analytical methods that can be used for binary response data of this type including conditional logistic regression which is available with the STRATA statement in PROC LOGISTIC. Other possibilities include the Generalized Estimating Equations and Alternating Logistic Regressions using the REPEATED statement in PROC GEE or PROC GENMOD and random effect models using the RANDOM statement in PROC GLIMMIX. A non-model-based approach using stratification is available with the CMH option in PROC FREQ when a multi-way table is specified. See the examples in the documentation for each of these procedures.

TomHsiung
Pyrite | Level 9
Hello, Dave

Thank you for your feedback. Say, we have event A=1 and B=0. According to the matching condition, we have A + B = 1.

The condition probability of P( A = 1 & B = 0 | A + B = 1) is calculated as P (A = 1 & B = 0) / [P(A = 1 & B = 0) + P (A = 0 & B = 1)]
Please notice the last step, they have:

P (A = 1 & B = 0) = P(A=1)*P(B=0)
P (A = 0 & B = 1) = P(A=0)*P(B=1)

I think the last two formula means A = 1 and B = 0 as well as A = 0 and B = 1 are independent, respectively.

That's why I'm confused about the independence between event A and B.
FreelanceReinh
Jade | Level 19

@TomHsiung wrote:
I think the last two formula means A = 1 and B = 0 as well as A = 0 and B = 1 are independent, respectively.

That's why I'm confused about the independence between event A and B.

Hello @TomHsiung,

 

I see your point.

 

My understanding is that the random variables describing the individual responses are independent, hence the products of probabilities in the Wikipedia article you have mentioned. Yet, as a rule, matched pairs are correlated in the following sense: If X, Y denote the responses of a randomly selected matched pair, these random variables X and Y are usually dependent because they tend to have similar response probabilities due to the matching.

 

Here is a small example with only two matched pairs: (X11, X12) and (X21, X22), each representing, say, (case, control).

Assume independent Bernoulli distributions X11~B(1, 0.94), X12~B(1, 0.91), X21~B(1, 0.22), X22~B(1, 0.29).

 

Define X:=XU1, Y:=XU2 with a random variable U, independent of the Xij, describing the selection of a pair: P(U=1)=P(U=2)=0.5.

 

Then we obtain the joint distribution of (X, Y) from calculations like P(X=1, Y=1) = P(U=1)P(X11=1)P(X12=1)+P(U=2)P(X21=1)P(X22=1)=0.4596:

 

P X=0 X=1
Y=0 0.2796 0.1204 0.4
Y=1 0.1404 0.4596 0.6
  0.42 0.58 1

 

Now we see that X and Y are correlated, hence dependent:
Their correlation coefficient is rX,Y=(0.4596-0.58*0.6)/sqrt(0.58*0.42*0.6*0.4)=0.46155...

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 936 views
  • 2 likes
  • 3 in conversation