Comparison of 2 proportions

Shawn08 · Posted 04-11-2019 11:01 AM

Hi,

I'm sure this is an easy question, but I would gladly appreciate the help.

I want to compare the proportion of women through two different phases. Let's say I have the following dataset:

data test;
	input wom second_phase wom_2 wtps;
	datalines;
	1 1 1 3
	1 0 . 2
	1 1 0 2
	0 1 0 2
	0 0 . 5
	1 1 1 3
	1 1 0 4
	1 1 . 4
	1 0 . 5
	0 1 0 1
	0 0 . 3
	1 1 . 2
	1 1 1 2
	1 1 0 2
	;
run;

Using the weights, I can calculate that the weighted proportion of women in before second phase is equal to 72.5%, whereas the weighted proportion of women among those who qualified for the second phase (second_phase = 1) is equal to 42.1%.

What procedure could I use to be able to conclude that the differences between the weighted proportions of women across the two steps are statistically significant?

Thanks you in advance!

PaigeMiller · Posted 04-11-2019 11:11 AM

Example using PROC FREQ

https://documentation.sas.com/?cdcId=pgmmvacdc&cdcVersion=9.4&docsetId=statug&docsetTarget=statug_fr...

--
Paige Miller

Shawn08 · Posted 04-11-2019 11:24 AM

Thank you for your quick response.

The problem I have is that the missing values found in the variable wom_2 are excluding the observations in proc freq for both wom and wom_2.

Here is the code I use for the proc freq

proc freq data = test;
	tables wom * wom_2 / chisq;
	weight wtps;
run;

And here is the output

Table of wom by wom_2
		wom_2		Total
		0	1
wom		3	0	3
0	Frequency
	Percent	15.79	0	15.79
	Row Pct	100	0
	Col Pct	27.27	0
1	Frequency	8	8	16
	Percent	42.11	42.11	84.21
	Row Pct	50	50
	Col Pct	72.73	100
		11	8	19
Total	Frequency
	Percent	57.89	42.11	100
Frequency Missing = 21

Thank you

PaigeMiller · Posted 04-11-2019 11:35 AM

Yes, data with missing values in one of the two category variables cannot be used in this analysis.

--
Paige Miller

Shawn08 · Posted 04-11-2019 11:38 AM

Is there any other alternatives I could use?

Reeza · Posted 04-11-2019 03:26 PM

@Shawn08 wrote:
Is there any other alternatives I could use?

What do you mean? If you don't have the value for that observation how are going to account for that?

Shawn08 · Posted 04-15-2019 03:08 PM

This is what I'm looking for. I need to find a method which compare proportions of population in phase 1 and phase 2 when some of the sample are dropped in the 2nd step. Would you have any idea?

Thanks

Reeza · Posted 04-15-2019 04:57 PM

This precludes paired testing, but I would run two t-tests as normal. In one situation leave all records in each group and test the difference, in the second, only include observations where you have measurements for both start and finish. If the results are not similar then you have an issue. Without knowing how many are missing or why they're missing (systematic vs random) we can't advise on imputation methods.

Shawn08 · Posted 04-16-2019 11:14 AM

I tried it and it works. However since the second group only consist on those who past the first step, our two sample aren't independent and therefore I don't think I can use a t-test. Am I right?

Reeza · Posted 04-16-2019 11:21 AM

I don't know how you're using independent here...and I don't think it's quite right. The assumptions of a t-test are met, IMO, but you have a biased sample if that's what you're thinking.

Reeza · Posted 04-11-2019 11:13 AM

PROC FREQ or SURVEYFREQ using the binomial option. I think there's an example in the documentation.

Shawn08 · Posted 04-11-2019 11:27 AM

Please see my other message for additional explanation of my problem. Thank you

Comparison of 2 proportions

Re: Comparison of 2 proportions

Re: Comparison of 2 proportions

Re: Comparison of 2 proportions

Re: Comparison of 2 proportions

Re: Comparison of 2 proportions

Re: Comparison of 2 proportions

Re: Comparison of 2 proportions

Re: Comparison of 2 proportions

Re: Comparison of 2 proportions

Re: Comparison of 2 proportions

Re: Comparison of 2 proportions

SAS Innovate 2025: Call for Content