- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi there. I am very new to SAS, and to statistics in general (actually I have never used SAS before and I am very much stuck, it took me a whole day just to figure out how to run the program and print out each line in the dataset). I have a dataset about lifestyle and health of the population of a country. The survey was conducted using a stratified clustered multistage design. I have to take a 30% sample from this and perform some descriptive statistical analysis. Each individual in the dataset has a weight assigned (this is because some provinces were over represented while others were under represented and also because different members in ahousehold had different probabilities of being selected). When I take my sample, do I take a simple random sample, and then include the weights afterwards when I calculate the means and the variance and so on? Or do I take the weights into account while I select the sample (so that I don't take too many individuals from over represented provinces)? What would the SAS code look like?
I really appreciate your feedback
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If I understand the sampling plan correctly then you should be subsampling clusters (households) within strata (provinces, etc.). The analysis could look something like this:
/* Subsample 30% of clusters (households).
Extracts a subsample in dataset subStudy, adds variable SelectionProb */
proc surveySelect data=originalStudy out=subStudy rate=0.3;
strata Province;
cluster HouseholdID; /* identifies a household within a province */
run;
/* Multiply original selection probabilities by subsampling probabilities
to create new weights */
data subWeightedStudy;
set subStudy;
newWeight = originalWeight * SelectionProb;
run;
/* Estimate means and rates.
Households is an optional input dataset that gives the total number of
clusters (households) (in variable named _TOTAL_) per stratum (province) */
proc surveyMeans data=subWeightedStudy total=Households;
class LifeStyleClass;
strata Province;
cluster HouseholdID;
weight newWeight;
var LifeStyleClass HealthVar; /* Estimate LifeStyleClass rates and HealthVar mean */
run;
SAS documentation contains a simple example of a stratified cluster sampling design analysis :
PG