- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi...I am trying to select 25 clients with 1 record for each client. Each client will have more than 1 record. When I run the code below, I end up with only 23 clients but when I change the samplesize from 25 to 27, I end up with 25 clients which is what I wanted. Any suggestions how to correct this.....Thanks.
proc surveyselect data = SUMMARY94 method = URS rep = 1 sampsize = 25 seed = 12345 out = hsbs3;
id _ALL_;
SAMPLINGUNIT CLIENT_ID;
run;
DATA hsbs2;
SET hsbs3;
do sampleUnitID=1 to numberhits;
output;
end;
run;
PROC SORT DATA=hsbs2;
BY REPLICATE CLIENT_ID SAMPLEUNITID;
RUN;
DATA hsbs1;
SET hsbs2;
BY REPLICATE CLIENT_ID SAMPLEUNITID;
IF FIRST.REPLICATE THEN SAMPLEID=0;
IF FIRST.SAMPLEUNITID THEN SAMPLEID+1;
RUN;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @twildone,
The reason for the number of clients being less than the specified sample size is that you used unrestricted random sampling (URS), which implies sampling with replacement.
I think, you should follow a two-stage approach:
/* Stage 1: Select all records of 25 randomly selected clients (simple random sampling of clients) */
proc surveyselect data=summary94 n=25 seed=31415 out=stage1;
samplingunit client_id;
run;
/* Stage 2: Select one record per client (simple random sampling of records, stratified by client) */
proc surveyselect data=stage1 n=1 seed=27182 out=hsbs(drop=SelectionProb SamplingWeight);
strata client_id;
run;
(Edit: Dropped variables SelectionProb and SamplingWeight from output dataset assuming these are not needed.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try this.
proc sort data=SUMMARY94;
by CLIENT_ID;
run;
proc surveyselect data = SUMMARY94 method = URS sampsize = 1 seed = 12345 out = hsbs3;
strata CLIENT_ID;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @twildone,
The reason for the number of clients being less than the specified sample size is that you used unrestricted random sampling (URS), which implies sampling with replacement.
I think, you should follow a two-stage approach:
/* Stage 1: Select all records of 25 randomly selected clients (simple random sampling of clients) */
proc surveyselect data=summary94 n=25 seed=31415 out=stage1;
samplingunit client_id;
run;
/* Stage 2: Select one record per client (simple random sampling of records, stratified by client) */
proc surveyselect data=stage1 n=1 seed=27182 out=hsbs(drop=SelectionProb SamplingWeight);
strata client_id;
run;
(Edit: Dropped variables SelectionProb and SamplingWeight from output dataset assuming these are not needed.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks....it worked perfectly!!!!!