BookmarkSubscribeRSS Feed
Shawn08
Obsidian | Level 7

Hi,

 

I'm sure this is an easy question, but I would gladly appreciate the help.

 

I want to compare the proportion of women through two different phases. Let's say I have the following dataset:

data test;
	input wom second_phase wom_2 wtps;
	datalines;
	1 1 1 3
	1 0 . 2
	1 1 0 2
	0 1 0 2
	0 0 . 5
	1 1 1 3
	1 1 0 4
	1 1 . 4
	1 0 . 5
	0 1 0 1
	0 0 . 3
	1 1 . 2
	1 1 1 2
	1 1 0 2
	;
run;

Using the weights, I can calculate that the weighted proportion of women in before second phase is equal to 72.5%, whereas the weighted proportion of women among those who qualified for the second phase (second_phase = 1) is equal to 42.1%.

 

What procedure could I use to be able to conclude that the differences between the weighted proportions of women across the two steps are statistically significant?

 

Thanks you in advance!

11 REPLIES 11
Shawn08
Obsidian | Level 7

Thank you for your quick response.

 

The problem I have is that the missing values found in the variable wom_2 are excluding the observations in proc freq for both wom and wom_2.

 

Here is the code I use for the proc freq

proc freq data = test;
	tables wom * wom_2 / chisq;
	weight wtps;
run;

 

And here is the output

Table of wom by wom_2

 

 

 

 

 

 

wom_2

 

Total

 

 

0

1

 

wom

 

3

0

3

0

Frequency

 

 

 

 

Percent

15.79

0

15.79

 

Row Pct

100

0

 

 

Col Pct

27.27

0

 

1

Frequency

8

8

16

 

Percent

42.11

42.11

84.21

 

Row Pct

50

50

 

 

Col Pct

72.73

100

 

 

 

11

8

19

Total

Frequency

 

 

 

 

Percent

57.89

42.11

100

Frequency Missing = 21

 

 

 

 

 

Thank you

PaigeMiller
Diamond | Level 26

Yes, data with missing values in one of the two category variables cannot be used in this analysis.

--
Paige Miller
Shawn08
Obsidian | Level 7
Is there any other alternatives I could use?
Reeza
Super User

@Shawn08 wrote:
Is there any other alternatives I could use?

What do you mean? If you don't have the value for that observation how are going to account for that? 

Shawn08
Obsidian | Level 7
This is what I'm looking for. I need to find a method which compare proportions of population in phase 1 and phase 2 when some of the sample are dropped in the 2nd step. Would you have any idea?

Thanks
Reeza
Super User
This precludes paired testing, but I would run two t-tests as normal. In one situation leave all records in each group and test the difference, in the second, only include observations where you have measurements for both start and finish. If the results are not similar then you have an issue. Without knowing how many are missing or why they're missing (systematic vs random) we can't advise on imputation methods.
Shawn08
Obsidian | Level 7
I tried it and it works. However since the second group only consist on those who past the first step, our two sample aren't independent and therefore I don't think I can use a t-test. Am I right?
Reeza
Super User
I don't know how you're using independent here...and I don't think it's quite right. The assumptions of a t-test are met, IMO, but you have a biased sample if that's what you're thinking.
Reeza
Super User
PROC FREQ or SURVEYFREQ using the binomial option. I think there's an example in the documentation.
Shawn08
Obsidian | Level 7
Please see my other message for additional explanation of my problem. Thank you

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 1687 views
  • 1 like
  • 3 in conversation