BookmarkSubscribeRSS Feed
AndreaBov
Calcite | Level 5

Hi,

 

i have this simple problem that i am trying to solve with the proc surveyselect but I'm not able to obtain the desired results.

 

I have a dataset with 3 variables: Country (takes only two values EU, NON EU) Segment (takes 3 values Large,Medium,Small) and Dollar (amount of exposure).

 

The dataset has 10000 lines.

 

I would like to extract a random sample of 40 lines that has the same (or close to the same) distribution of the original sample in terms of Dollar.

 

I am using this code:

 

proc sort data=dataset; by Country Segment;

proc surveyselect data =dataset out = samp1 method = pps sampsize=40 seed = 9876 ;
strata Country Segment;
size Dollar;
run;

 

I get a sample of 40 records but the proportion of country and segment weighted by dollar are not the same at all with respect to the original sample.

 

Where am i wrong?

 

1 REPLY 1
ballardw
Super User

That is what PPS does with a Size variable, if a value of Dollar is larger it is more likely to be selected.

If you want the proportion of dollar values to approximate the data as a whole then look at SRS or SYS methods instead. If you have a wide range of values for you dollar amounts I might suggest the SYS method.

 

With 6 groups (2*3) and selecting only 40 records you may have to be flexible about how close you want those proportions to match.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 443 views
  • 0 likes
  • 2 in conversation