Home
- /
SAS Programming
- /
SAS Procedures
- /
Selecting a sample from a dataset with weights and...

06-23-2012 01:39 PM

Hi there. I am very new to SAS, and to statistics in general (actually I have never used SAS before and I am very much stuck, it took me a whole day just to figure out how to run the program and print out each line in the dataset). I have a dataset about lifestyle and health of the population of a country. The survey was conducted using a stratified clustered multistage design. I have to take a 30% sample from this and perform some descriptive statistical analysis. Each individual in the dataset has a weight assigned (this is because some provinces were over represented while others were under represented and also because different members in ahousehold had different probabilities of being selected). When I take my sample, do I take a simple random sample, and then include the weights afterwards when I calculate the means and the variance and so on? Or do I take the weights into account while I select the sample (so that I don't take too many individuals from over represented provinces)? What would the SAS code look like?

06-23-2012 10:41 PM

If I understand the sampling plan correctly then you should be subsampling clusters (households) within strata (provinces, etc.). The analysis could look something like this:

**/* Subsample 30% of clusters (households). **

**Extracts a subsample in dataset subStudy, adds variable SelectionProb */**

**proc surveySelect data=originalStudy out=subStudy rate=0.3;****strata Province;****cluster HouseholdID; /* identifies a household within a province */****run;**

**/* Multiply original selection probabilities by subsampling probabilities **

**to create new weights */**

**data subWeightedStudy;****set subStudy;****newWeight = originalWeight * SelectionProb;****run;**

**/* Estimate means and rates. **

**Households is an optional input dataset that gives the total number of **

**clusters (households) ****(in variable named _TOTAL_) per stratum (province) */**

**proc surveyMeans data=subWeightedStudy total=Households;**

**class LifeStyleClass;****strata Province;****cluster HouseholdID;****weight newWeight;****var LifeStyleClass HealthVar; /* Estimate LifeStyleClass rates and HealthVar mean */****run;**

SAS documentation contains a simple example of a stratified cluster sampling design analysis :

