I have 2 datasets. Dataset X looks like this:
Jmp_o Nws_r
1 0
0 1
1 1
0 1
0 1
1 0
1 0
... ...
I calculate the conditional probability P(jmp_o=1|nws_r=1). There is another dataset Y which is like:
Jmp_o Nws_r
1 0
0 0
1 0
0 0
0 0
1 0
1 0
... ...
From dataset Y I calculate unconditional probability P(jmp_o=1).
I want to test whether these 2 probabilities are statistically different (by means of p-value).
What test should I perform?
Much thanks.
So, all you need is dataset X. Run a Fisher test (proc freq) between jmp_o and nws_r. This will tell you wether the two vars are related in your sample.
This isn't a case where statistical testing is appropriate. The formulas used are different, so the results are mathematically different.
Statistics would be used only if sampling differences caused different results.
Yes, the two datasets are 2 different samples.
@d6k5d3 wrote:
Yes, the two datasets are 2 different samples.
This is not clear to me based upon your original explanation.
Please explain further.
I stick with my previous statement that this is not a case where statistical testing is appropriate.
If you have a sample of people, and you measure their heights in inches, and then you take an independent sample and measure their height in centimeters, you would not do a statistical test to determine if the average height in inches differ from the average height in centimeters. You would just assume they are different because a different measurement was used.
So, all you need is dataset X. Run a Fisher test (proc freq) between jmp_o and nws_r. This will tell you wether the two vars are related in your sample.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.