BookmarkSubscribeRSS Feed
Data_Detective_23219
Calcite | Level 5


I've been reading through the documentation on this procedure and I cannot explicitly find (or perhaps identify) the appropriate syntax to accomodate a specific sampling objective.  I am looking to generate a random sample without replacement on a dataset that contains claims adjudicated by different users.  Because some users process more claims than others, I want the sampling methodology to assign each user an equal probability of being chosen in the random sample  (I do not need a specific number [or proportion] of observations in the random sample from each user as many of the documentation examples show).  This will purge the bias related to claims examiners who process more claims having a higher chance of being "randomly" chosen because more of their claim lines exist on the dataset.

I will provide a couple of arbitrary parameters so someone can illustrate the appropriate syntax.

Claimlines: 1,000,000

Random Sample Size: 100

Variable identifying the claims examiner: User

Thanks!

3 REPLIES 3
SteveDenham
Jade | Level 19

If you want equal probability of being selected based on User, why not do something like:

/* Make up some data */

data have;
do user=1 to 1000;
do claim=1 to 2000;
flag=mod(user,2);
output;
end;
end;
run;

/* Make it so that some of the users have more claims than others, so that this is a little more general */
data have;
set have;
if flag=0 and claim>1000 then delete;
run;

/* Give every record a unique random number */

/* This is the point where you can use your own "have" dataset */

data firstpass;

call streaminit(111);

set have;

ranno=rand('UNIFORM');

run;

/* Sort now so that records are randomized within each user */

proc sort data=firstpass out=secondpass;

by user ranno;

run;

/* Select the record for each user with the lowest random value */

data thirdpass;

set secondpass;

by user ranno;

if first.user;

run;

/* Now resort across all users in preparation of selecting with equal opportunity for selection */

proc sort data=thirdpass out=fourthpass;

by ranno;

run;

/* Select 100 random users, with one of several records still attached */

data want;

set fourthpass;

if _n_<=100;

run;

If you want all or multiple records for the selected users, you can then merge this back against the original data set.

(And I am sure there is a way to do this in PROC SURVEYSELECT, but this is pretty fast, and pretty straightforward.  I think  can come up with something better than this.  Despite all the sorting and subsetting, this took about 1.7 seconds cpu time, and a little less real time due to multithreading).

Steve Denham

Steve Denham

Data_Detective_23219
Calcite | Level 5

Thanks Steve!  Your example was very intuitive and straightforward.

For anyone else reading this, I am looking for a SAS book that focuses entirely on sampling as I will be involved a lot more moving forward on this type of work.

Thanks!

PGStats
Opal | Level 21

Hi! If you are going to be involved in a lot of sampling then you should definitely get familiar with SURVEYSELECT. I haven't a full knowledge of the procedure but, anyway, here is how I would accomplish your sampling :


data test;
call streaminit(3746);
do user = 1 to 30;
      do j = 1 to 1 + rand("POISSON",3);
          claim = user * 100 + j;
          output;
          end;
     end;
drop j;
run;

/* Select some user's claims as clusters */

proc surveyselect data=test out=temp n=10;
samplingunit user;
run;

/* Select one claim per selected user */

proc surveyselect data=temp out=want n=1;
strata user;
run;

PG

PG

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 883 views
  • 1 like
  • 3 in conversation