DATA Step, Macro, Functions and more

re: PROC SURVEYSELECT

Accepted Solution Solved
Reply
Regular Contributor
Posts: 222
Accepted Solution

re: PROC SURVEYSELECT

Hi...I am trying to select 25 clients with 1 record for each client. Each client will have more than 1 record. When I run the code below, I end up with only 23 clients but when I change the samplesize from 25 to 27, I end up with 25 clients which is what I wanted. Any suggestions how to correct this.....Thanks.

 

proc surveyselect data = SUMMARY94 method = URS rep = 1 sampsize = 25 seed = 12345 out = hsbs3;

id _ALL_;

SAMPLINGUNIT CLIENT_ID;

run;

 

DATA hsbs2;

     SET hsbs3;

do sampleUnitID=1 to numberhits;

     output;

     end;

   run;

 

PROC SORT DATA=hsbs2;

     BY REPLICATE CLIENT_ID SAMPLEUNITID;

RUN;

 

DATA hsbs1;

     SET hsbs2;

     BY REPLICATE CLIENT_ID SAMPLEUNITID;

           IF FIRST.REPLICATE THEN SAMPLEID=0;

           IF FIRST.SAMPLEUNITID THEN SAMPLEID+1;

RUN;


Accepted Solutions
Solution
‎05-06-2016 12:04 PM
Trusted Advisor
Posts: 1,115

Re: re: PROC SURVEYSELECT

[ Edited ]

Hi @twildone,

 

The reason for the number of clients being less than the specified sample size is that you used unrestricted random sampling (URS), which implies sampling with replacement.

 

I think, you should follow a two-stage approach:

/* Stage 1: Select all records of 25 randomly selected clients (simple random sampling of clients) */

proc surveyselect data=summary94 n=25 seed=31415 out=stage1;
samplingunit client_id;
run;

/* Stage 2: Select one record per client (simple random sampling of records, stratified by client) */

proc surveyselect data=stage1 n=1 seed=27182 out=hsbs(drop=SelectionProb SamplingWeight);
strata client_id;
run;

(Edit: Dropped variables SelectionProb and SamplingWeight from output dataset assuming these are not needed.)

View solution in original post


All Replies
Trusted Advisor
Posts: 1,204

Re: re: PROC SURVEYSELECT

Try this.

 

proc sort data=SUMMARY94;
by CLIENT_ID;
run;

 

proc surveyselect data = SUMMARY94 method = URS sampsize = 1 seed = 12345 out = hsbs3;
strata CLIENT_ID;
run;

Solution
‎05-06-2016 12:04 PM
Trusted Advisor
Posts: 1,115

Re: re: PROC SURVEYSELECT

[ Edited ]

Hi @twildone,

 

The reason for the number of clients being less than the specified sample size is that you used unrestricted random sampling (URS), which implies sampling with replacement.

 

I think, you should follow a two-stage approach:

/* Stage 1: Select all records of 25 randomly selected clients (simple random sampling of clients) */

proc surveyselect data=summary94 n=25 seed=31415 out=stage1;
samplingunit client_id;
run;

/* Stage 2: Select one record per client (simple random sampling of records, stratified by client) */

proc surveyselect data=stage1 n=1 seed=27182 out=hsbs(drop=SelectionProb SamplingWeight);
strata client_id;
run;

(Edit: Dropped variables SelectionProb and SamplingWeight from output dataset assuming these are not needed.)

Regular Contributor
Posts: 222

Re: re: PROC SURVEYSELECT

Thanks....it worked perfectly!!!!!

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 221 views
  • 0 likes
  • 3 in conversation