Hi...I am trying to select 25 clients with 1 record for each client. Each client will have more than 1 record. When I run the code below, I end up with only 23 clients but when I change the samplesize from 25 to 27, I end up with 25 clients which is what I wanted. Any suggestions how to correct this.....Thanks.
proc surveyselect data = SUMMARY94 method = URS rep = 1 sampsize = 25 seed = 12345 out = hsbs3;
id _ALL_;
SAMPLINGUNIT CLIENT_ID;
run;
DATA hsbs2;
SET hsbs3;
do sampleUnitID=1 to numberhits;
output;
end;
run;
PROC SORT DATA=hsbs2;
BY REPLICATE CLIENT_ID SAMPLEUNITID;
RUN;
DATA hsbs1;
SET hsbs2;
BY REPLICATE CLIENT_ID SAMPLEUNITID;
IF FIRST.REPLICATE THEN SAMPLEID=0;
IF FIRST.SAMPLEUNITID THEN SAMPLEID+1;
RUN;
Hi @twildone,
The reason for the number of clients being less than the specified sample size is that you used unrestricted random sampling (URS), which implies sampling with replacement.
I think, you should follow a two-stage approach:
/* Stage 1: Select all records of 25 randomly selected clients (simple random sampling of clients) */
proc surveyselect data=summary94 n=25 seed=31415 out=stage1;
samplingunit client_id;
run;
/* Stage 2: Select one record per client (simple random sampling of records, stratified by client) */
proc surveyselect data=stage1 n=1 seed=27182 out=hsbs(drop=SelectionProb SamplingWeight);
strata client_id;
run;
(Edit: Dropped variables SelectionProb and SamplingWeight from output dataset assuming these are not needed.)
Try this.
proc sort data=SUMMARY94;
by CLIENT_ID;
run;
proc surveyselect data = SUMMARY94 method = URS sampsize = 1 seed = 12345 out = hsbs3;
strata CLIENT_ID;
run;
Hi @twildone,
The reason for the number of clients being less than the specified sample size is that you used unrestricted random sampling (URS), which implies sampling with replacement.
I think, you should follow a two-stage approach:
/* Stage 1: Select all records of 25 randomly selected clients (simple random sampling of clients) */
proc surveyselect data=summary94 n=25 seed=31415 out=stage1;
samplingunit client_id;
run;
/* Stage 2: Select one record per client (simple random sampling of records, stratified by client) */
proc surveyselect data=stage1 n=1 seed=27182 out=hsbs(drop=SelectionProb SamplingWeight);
strata client_id;
run;
(Edit: Dropped variables SelectionProb and SamplingWeight from output dataset assuming these are not needed.)
Thanks....it worked perfectly!!!!!
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.