BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jgtaylor
Calcite | Level 5

Hi,

 

Apologies if this is a silly question, I am a relatively new SAS user currently running 9.3.

 

I have a dataset (CITIES) of around 20,000 cities, each with population size classification, climate type, and national GDP classification.Some cities in the dataset also have data on transport, health, etc, which I have summarised into a single column called DataCoverage which counts the columns with known data for each city.

 

I'd like to do further analysis on a sub-sample of cities, and I would like to randomly select them in a manner which reflects the existing proportions of the data. I have done:

 

proc surveyselect data = CITIES out = samp1 method = srs sampsize=200 seed = 9876;
strata CLIMATE POPULATION_CLASS GDP_CLASS / alloc=proportional;

run;

 

What I would really like to do is select a subsample, which represents the proportions of the original dataset, but gives more weight to those with a larger DataCoverage (i.e. more known data, so I don't have to go find the data somewhere myself). Is such a thing possible?

 

Thanks,

 

Jon

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

The SIZE has to be non-zero in the basic use of surveyselect. Since we are using something that really isn't a population size counter then I would suggest add 1 to your datasource rate for all variables to get a 1 or greater and then at the end subtract the one out to get back to the original rank.

 

Also every record should have a datasource value or they would be excluded.

View solution in original post

4 REPLIES 4
ballardw
Super User

If you can provide a numeric variable that represents data coverage, with larger meaning more coverage, you might be able to get this with a PPS selection using that variable for the SIZE.

Depending on how you are defining "reflects the existing proportions" you may need to look at setting sample sizes per strata. 

jgtaylor
Calcite | Level 5

Thanks, I think this gets me close.

 

I'm defining "reflects the existing proportions" as, for example:

 If CITIES with a Population_Class of <50K with GDP_Class of "LowerMiddle GDP" in CLIMATE "Temperate Humid"  comprise 5% of all cities in the world, then I want them to be 5% of my sampled dataset. The percent of each strata in the sample should reflect that in the original dataset.

 

I have changed to:

 

proc surveyselect data = CITIES out = samp1 method = pps sampsize=200 seed = 9876;
strata CLIMATE POPULATION_CLASS GDP_CLASS / alloc=proportional;
size DataCoverage;
run;

 

I've got a few problems, namely:

 

1) it's not giving me any cities with DataCoverage=0. It's ok to have some in order to maintain proportions, I just want to minimise them if possible

2) Since DataCoverage isn't great, I am not getting a sample size of 200 (97, actually).

 

Thanks again for your help.

 

Jon

ballardw
Super User

The SIZE has to be non-zero in the basic use of surveyselect. Since we are using something that really isn't a population size counter then I would suggest add 1 to your datasource rate for all variables to get a 1 or greater and then at the end subtract the one out to get back to the original rank.

 

Also every record should have a datasource value or they would be excluded.

jgtaylor
Calcite | Level 5

Very clever! That seems to have done what I wanted it to. Thanks for your help!

 

Jon

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1828 views
  • 1 like
  • 2 in conversation