BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rob9999
Fluorite | Level 6

Hi, 

 

Hopefully someone can shed some light on my question. 

 

I have a dataset (aka 'sampling' dataset) from which I want to select samples, using the distribution of another dataset as sampling guide based on strata and _nsize_ for corresponding strata.

 

The 'sampling' dataset is actually a surveyed dataset and within the dataset, each record/row has a corresponding sampling weight (aka 's_weight') in it already. I suspect that if I just do a URS or SRS surveyselect directly from the sampling dataset, it might actually produce an unrepresentative sampling sets, i.e. the surveyselect will just treat every row in the 'sampling' dataset with weight of 1. 

 

My question: is there any method or options within the proc surveyselect that can take the 's_weight' into account during the sampling process? my current instinct is telling me to just populate a 'new sampling' dataset by converting s_weight into corresponding number of records, i.e. each record weight in the 'new sampling' dataset has weight of 1. but i could be wrong here....

 

example:

 

data sampling;
 input ID income s_weight;
cards;
1 1 300  
2 2 50 
3 3 30 
4 1 45 

5 5 82

6 1 55

7 1 321
;

 

data reference;

input income _nsize_;
cards;
1  20
2 50 
3 10 
4 5 

6 70

;

 

 

 

proc surveyselect data=sampling

out=work.output
method=srs
n=reference
reps=1000
seed=2223;
strata income;
run;

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Watts
SAS Employee

PROC SURVEYSELECT provides PPS (probability proportional to size) selection.

There's some info here: selection methodsSIZE statement, and example .

View solution in original post

3 REPLIES 3
Watts
SAS Employee

PROC SURVEYSELECT provides PPS (probability proportional to size) selection.

There's some info here: selection methodsSIZE statement, and example .

ballardw
Super User

The Proc will take a data set for the SAMPSIZE or SAMPRATE procedure option with specifically named variables to provide sampling information. The set would have to contain the Strata variable(s) in sorted order and _Nsize_ to have the number of records to select from each stratum with the SAMPSIZE data set or _Rate_ , percentage of records to select from each stratum, with the SAMPRATE option.

 

So which ever is easier to create. _Nsize_ if used really should be an integer.

 

 

DWilson
Pyrite | Level 9

Method=PPS will let you specify the weight as a size measure (records with larger weights will have a higher chance of being selected.)

 

You still need to tell surveyselect how many records to select within each sampling stratum.

 

Also, if you are using weights that have been adjusted to account for nonrseponse and/or frame coverage then you may run into a situation where some records have weights that are too large relative to the other weights in the same sampling strata. In such a situation, you should pull those records out and select them with certainty.  The technical check is to calculate for each record in a given sampling stratum:

 

(sample size desired from the stratum)*(size measure for a record (the weight in your case)) / (sum of the size measures across all records in the stratum)

 

If that quantity is greater than or equal to 1 then you need to select that record with certainty and take it out of your data set before using surveyselect.  You'll need to re-do that check each time you take out a record because the denominator of that quantity changes as you remove records and because your sample size also is reduced because you've selected some records with certainty and don't need to use PPS to select your original sample size.  Once you've done this for all strata, you adjust your stratum sample sizes for surveyselect by reducing the stratum sample sizes you need by the number of certainty selections. Use surveyselect to select a PPS sample with the reduced sample sizes.

 

I suggest using PPS_SEQ or PPS_SYS not PPS_WR or PPS_WOR if you want to exactly achieve your desired sample size.

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 2354 views
  • 3 likes
  • 4 in conversation