## Comparison of 2 proportions

I'm sure this is an easy question, but I would gladly appreciate the help.

I want to compare the proportion of women through two different phases. Let's say I have the following dataset:

``````data test;
input wom second_phase wom_2 wtps;
datalines;
1 1 1 3
1 0 . 2
1 1 0 2
0 1 0 2
0 0 . 5
1 1 1 3
1 1 0 4
1 1 . 4
1 0 . 5
0 1 0 1
0 0 . 3
1 1 . 2
1 1 1 2
1 1 0 2
;
run;``````

Using the weights, I can calculate that the weighted proportion of women in before second phase is equal to 72.5%, whereas the weighted proportion of women among those who qualified for the second phase (second_phase = 1) is equal to 42.1%.

What procedure could I use to be able to conclude that the differences between the weighted proportions of women across the two steps are statistically significant?

Thank you for your quick response.

The problem I have is that the missing values found in the variable wom_2 are excluding the observations in proc freq for both wom and wom_2.

Here is the code I use for the proc freq

``````proc freq data = test;
tables wom * wom_2 / chisq;
weight wtps;
run;``````

And here is the output

 Table of wom by wom_2 wom_2 Total 0 1 wom 3 0 3 0 Frequency Percent 15.79 0 15.79 Row Pct 100 0 Col Pct 27.27 0 1 Frequency 8 8 16 Percent 42.11 42.11 84.21 Row Pct 50 50 Col Pct 72.73 100 11 8 19 Total Frequency Percent 57.89 42.11 100 Frequency Missing = 21

Thank you

Yes, data with missing values in one of the two category variables cannot be used in this analysis.

Is there any other alternatives I could use?
@Shawn08 wrote:
Is there any other alternatives I could use?

What do you mean? If you don't have the value for that observation how are going to account for that?

This is what I'm looking for. I need to find a method which compare proportions of population in phase 1 and phase 2 when some of the sample are dropped in the 2nd step. Would you have any idea?

Thanks
This precludes paired testing, but I would run two t-tests as normal. In one situation leave all records in each group and test the difference, in the second, only include observations where you have measurements for both start and finish. If the results are not similar then you have an issue. Without knowing how many are missing or why they're missing (systematic vs random) we can't advise on imputation methods.
I tried it and it works. However since the second group only consist on those who past the first step, our two sample aren't independent and therefore I don't think I can use a t-test. Am I right?
I don't know how you're using independent here...and I don't think it's quite right. The assumptions of a t-test are met, IMO, but you have a biased sample if that's what you're thinking.
PROC FREQ or SURVEYFREQ using the binomial option. I think there's an example in the documentation.
Please see my other message for additional explanation of my problem. Thank you
