BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
_MooMoo
Obsidian | Level 7

Hello! 

 

I am pretty new to this community and would like to get some advice!

I am trying to analyze stratified random sampling data. From the source population, mutually exclusive strata were defined and samples were collected from those strata. My aim is to obtain statistics that represent the source population. Since regular SAS procedure assumes Simple Random Sampling, I should use PROC SURVEY procedure in order to reflect stratified sampling design. The default variance estimating method is Taylor linearization and this method is valid only if sample fraction is relatively small (please correct me if i am wrong). The other alternative way is jackknife replication method but after some research most of the jackknife method were used when the survey design is multistage sampling design. 

 

So my question is..

What method should I use if the survey design is stratified random sampling and the sample fraction is not relatively small? Sample fraction is the number of samples selected from the stratum. 

 

Any suggestions are welcome! Thanks in advance!!

1 ACCEPTED SOLUTION

Accepted Solutions
Watts
SAS Employee

I think this comment (about with-replacement selection or a small first-stage sampling fraction) pertains to estimating the variance by using the variation among PSUs. This applies to multistage sample designs. There's some info in this doc intro section, Survey Design Specification .

 

But it sounds like your design is a single-stage, stratified design. The default Taylor series method should be appropriate for your design. See some computation details here .

 

A good reference is the book Sampling: Design and Analysis by Sharon Lohr (2010). See Chapter 9 (Variance Estimation in Complex Surveys).  

 

View solution in original post

5 REPLIES 5
PaigeMiller
Diamond | Level 26

I would start with PROC SURVEYMEANS, which allows for stratified sampling.

https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=statug&docsetTarget=statu...

--
Paige Miller
ballardw
Super User

If the strata were sampled at different rates then you would use a Rate= data set to describe the sample. Also a Total= dataset (or N=) to describe the numbers of primary sampling units per strata. If you have two or more strata variables make sure that all levels in your data are reflected in these data sets.

 

Work through the examples in the documentation for Stratified Cluster Sample design. It is likely to be close from what you mention.

 

If you didn't use Jackknife or BRR sample design then leave it to Taylor series. Taylor series are not restricted to small sample fractions. It is a mathematic process used to estimate many types of value and is typically covered in a second semester of calculus (or at least when I took calculus).

_MooMoo
Obsidian | Level 7

Thank you for your input!

 

Stratified simple random sampling design was used rather than multistage sampling design (stratified cluster sample design) to create the data I am trying to analyze. Instead of selecting PSU within each stratum, samples were selected directly from each stratum. Most of the survey data were collected using multistage sampling design. So, it is quite hard for me find documentation shows what kind of variance estimation method I should use. (or maybe it is too simple). 

 

According to SAS documentation 14.1, "This variance estimation method (Taylor linearization) assumes that the first-stage sampling fraction is small, or the first-stage sample is drawn with replacement, as it often is in practice." For example, in my case, one of the stratum size is 108 and the number of the randomly selected samples from that stratum is 40. So the sampling fraction is around 40%, and this is quite large. This makes me hesitant to use Taylor linearization method to estimate variance.

 

ballardw
Super User

@_MooMoo wrote:

Thank you for your input!

 

Stratified simple random sampling design was used rather than multistage sampling design (stratified cluster sample design) to create the data I am trying to analyze. Instead of selecting PSU within each stratum, samples were selected directly from each stratum. Most of the survey data were collected using multistage sampling design. So, it is quite hard for me find documentation shows what kind of variance estimation method I should use. (or maybe it is too simple). 

 

According to SAS documentation 14.1, "This variance estimation method (Taylor linearization) assumes that the first-stage sampling fraction is small, or the first-stage sample is drawn with replacement, as it often is in practice." For example, in my case, one of the stratum size is 108 and the number of the randomly selected samples from that stratum is 40. So the sampling fraction is around 40%, and this is quite large. This makes me hesitant to use Taylor linearization method to estimate variance.

 


Please indicate which procedure and topic you are quoting. There is a lot of documentation for any SAS release.

Watts
SAS Employee

I think this comment (about with-replacement selection or a small first-stage sampling fraction) pertains to estimating the variance by using the variation among PSUs. This applies to multistage sample designs. There's some info in this doc intro section, Survey Design Specification .

 

But it sounds like your design is a single-stage, stratified design. The default Taylor series method should be appropriate for your design. See some computation details here .

 

A good reference is the book Sampling: Design and Analysis by Sharon Lohr (2010). See Chapter 9 (Variance Estimation in Complex Surveys).  

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1038 views
  • 0 likes
  • 4 in conversation