- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello!
I am pretty new to this community and would like to get some advice!
I am trying to analyze stratified random sampling data. From the source population, mutually exclusive strata were defined and samples were collected from those strata. My aim is to obtain statistics that represent the source population. Since regular SAS procedure assumes Simple Random Sampling, I should use PROC SURVEY procedure in order to reflect stratified sampling design. The default variance estimating method is Taylor linearization and this method is valid only if sample fraction is relatively small (please correct me if i am wrong). The other alternative way is jackknife replication method but after some research most of the jackknife method were used when the survey design is multistage sampling design.
So my question is..
What method should I use if the survey design is stratified random sampling and the sample fraction is not relatively small? Sample fraction is the number of samples selected from the stratum.
Any suggestions are welcome! Thanks in advance!!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think this comment (about with-replacement selection or a small first-stage sampling fraction) pertains to estimating the variance by using the variation among PSUs. This applies to multistage sample designs. There's some info in this doc intro section, Survey Design Specification .
But it sounds like your design is a single-stage, stratified design. The default Taylor series method should be appropriate for your design. See some computation details here .
A good reference is the book Sampling: Design and Analysis by Sharon Lohr (2010). See Chapter 9 (Variance Estimation in Complex Surveys).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I would start with PROC SURVEYMEANS, which allows for stratified sampling.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If the strata were sampled at different rates then you would use a Rate= data set to describe the sample. Also a Total= dataset (or N=) to describe the numbers of primary sampling units per strata. If you have two or more strata variables make sure that all levels in your data are reflected in these data sets.
Work through the examples in the documentation for Stratified Cluster Sample design. It is likely to be close from what you mention.
If you didn't use Jackknife or BRR sample design then leave it to Taylor series. Taylor series are not restricted to small sample fractions. It is a mathematic process used to estimate many types of value and is typically covered in a second semester of calculus (or at least when I took calculus).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your input!
Stratified simple random sampling design was used rather than multistage sampling design (stratified cluster sample design) to create the data I am trying to analyze. Instead of selecting PSU within each stratum, samples were selected directly from each stratum. Most of the survey data were collected using multistage sampling design. So, it is quite hard for me find documentation shows what kind of variance estimation method I should use. (or maybe it is too simple).
According to SAS documentation 14.1, "This variance estimation method (Taylor linearization) assumes that the first-stage sampling fraction is small, or the first-stage sample is drawn with replacement, as it often is in practice." For example, in my case, one of the stratum size is 108 and the number of the randomly selected samples from that stratum is 40. So the sampling fraction is around 40%, and this is quite large. This makes me hesitant to use Taylor linearization method to estimate variance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@_MooMoo wrote:
Thank you for your input!
Stratified simple random sampling design was used rather than multistage sampling design (stratified cluster sample design) to create the data I am trying to analyze. Instead of selecting PSU within each stratum, samples were selected directly from each stratum. Most of the survey data were collected using multistage sampling design. So, it is quite hard for me find documentation shows what kind of variance estimation method I should use. (or maybe it is too simple).
According to SAS documentation 14.1, "This variance estimation method (Taylor linearization) assumes that the first-stage sampling fraction is small, or the first-stage sample is drawn with replacement, as it often is in practice." For example, in my case, one of the stratum size is 108 and the number of the randomly selected samples from that stratum is 40. So the sampling fraction is around 40%, and this is quite large. This makes me hesitant to use Taylor linearization method to estimate variance.
Please indicate which procedure and topic you are quoting. There is a lot of documentation for any SAS release.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think this comment (about with-replacement selection or a small first-stage sampling fraction) pertains to estimating the variance by using the variation among PSUs. This applies to multistage sample designs. There's some info in this doc intro section, Survey Design Specification .
But it sounds like your design is a single-stage, stratified design. The default Taylor series method should be appropriate for your design. See some computation details here .
A good reference is the book Sampling: Design and Analysis by Sharon Lohr (2010). See Chapter 9 (Variance Estimation in Complex Surveys).