Help using Base SAS procedures

Selecting a sample from a dataset with weights and stratification

Not applicable
Posts: 1

Selecting a sample from a dataset with weights and stratification

Hi there. I am very new to SAS, and to statistics in general (actually I have never used SAS before and I am very much stuck, it took me a whole day just to figure out how to run the program and print out each line in the dataset). I have a dataset about lifestyle and health of the population of a country. The survey was conducted using a stratified clustered multistage design. I have to take a 30% sample from this and perform some descriptive statistical analysis. Each individual in the dataset has a weight assigned (this is because some provinces were over represented while others were under represented and also because different members in ahousehold had different probabilities of being selected). When I take my sample, do I take a simple random sample, and then include the weights afterwards when I calculate the means and the variance and so on? Or do I take the weights into account while I select the sample (so that I don't take too many individuals from over represented provinces)? What would the SAS code look like?

I really appreciate your feedback

Esteemed Advisor
Posts: 5,541

Re: Selecting a sample from a dataset with weights and stratification

Posted in reply to MissDemeanor

If I understand the sampling plan correctly then you should be subsampling clusters (households) within strata (provinces, etc.). The analysis could look something like this:

/* Subsample 30% of clusters (households).

Extracts a subsample in dataset subStudy, adds variable SelectionProb */

proc surveySelect data=originalStudy out=subStudy rate=0.3;
strata Province;
cluster HouseholdID; /* identifies a household within a province */

/* Multiply original selection probabilities by subsampling probabilities

to create new weights */

data subWeightedStudy;
set subStudy;
newWeight = originalWeight * SelectionProb;

/* Estimate means and rates.

Households is an optional input dataset that  gives the total number of

clusters (households) (in variable named _TOTAL_) per stratum (province) */

proc surveyMeans data=subWeightedStudy total=Households;

class LifeStyleClass;
strata Province;
cluster HouseholdID;
weight newWeight;
var LifeStyleClass HealthVar; /* Estimate LifeStyleClass rates and HealthVar mean */

SAS documentation contains a simple example of a stratified cluster sampling design analysis :


Ask a Question
Discussion stats
  • 1 reply
  • 2 in conversation