Programming the statistical procedures from SAS

Weighted random stratified sampling

Accepted Solution Solved
Reply
New Contributor
Posts: 3
Accepted Solution

Weighted random stratified sampling

Hi,

 

Apologies if this is a silly question, I am a relatively new SAS user currently running 9.3.

 

I have a dataset (CITIES) of around 20,000 cities, each with population size classification, climate type, and national GDP classification.Some cities in the dataset also have data on transport, health, etc, which I have summarised into a single column called DataCoverage which counts the columns with known data for each city.

 

I'd like to do further analysis on a sub-sample of cities, and I would like to randomly select them in a manner which reflects the existing proportions of the data. I have done:

 

proc surveyselect data = CITIES out = samp1 method = srs sampsize=200 seed = 9876;
strata CLIMATE POPULATION_CLASS GDP_CLASS / alloc=proportional;

run;

 

What I would really like to do is select a subsample, which represents the proportions of the original dataset, but gives more weight to those with a larger DataCoverage (i.e. more known data, so I don't have to go find the data somewhere myself). Is such a thing possible?

 

Thanks,

 

Jon


Accepted Solutions
Solution
‎09-25-2015 06:23 AM
Grand Advisor
Posts: 10,062

Re: Weighted random stratified sampling

The SIZE has to be non-zero in the basic use of surveyselect. Since we are using something that really isn't a population size counter then I would suggest add 1 to your datasource rate for all variables to get a 1 or greater and then at the end subtract the one out to get back to the original rank.

 

Also every record should have a datasource value or they would be excluded.

View solution in original post


All Replies
Grand Advisor
Posts: 10,062

Re: Weighted random stratified sampling

If you can provide a numeric variable that represents data coverage, with larger meaning more coverage, you might be able to get this with a PPS selection using that variable for the SIZE.

Depending on how you are defining "reflects the existing proportions" you may need to look at setting sample sizes per strata. 

New Contributor
Posts: 3

Re: Weighted random stratified sampling

Thanks, I think this gets me close.

 

I'm defining "reflects the existing proportions" as, for example:

 If CITIES with a Population_Class of <50K with GDP_Class of "LowerMiddle GDP" in CLIMATE "Temperate Humid"  comprise 5% of all cities in the world, then I want them to be 5% of my sampled dataset. The percent of each strata in the sample should reflect that in the original dataset.

 

I have changed to:

 

proc surveyselect data = CITIES out = samp1 method = pps sampsize=200 seed = 9876;
strata CLIMATE POPULATION_CLASS GDP_CLASS / alloc=proportional;
size DataCoverage;
run;

 

I've got a few problems, namely:

 

1) it's not giving me any cities with DataCoverage=0. It's ok to have some in order to maintain proportions, I just want to minimise them if possible

2) Since DataCoverage isn't great, I am not getting a sample size of 200 (97, actually).

 

Thanks again for your help.

 

Jon

Solution
‎09-25-2015 06:23 AM
Grand Advisor
Posts: 10,062

Re: Weighted random stratified sampling

The SIZE has to be non-zero in the basic use of surveyselect. Since we are using something that really isn't a population size counter then I would suggest add 1 to your datasource rate for all variables to get a 1 or greater and then at the end subtract the one out to get back to the original rank.

 

Also every record should have a datasource value or they would be excluded.

New Contributor
Posts: 3

Re: Weighted random stratified sampling

Very clever! That seems to have done what I wanted it to. Thanks for your help!

 

Jon

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 242 views
  • 1 like
  • 2 in conversation