BookmarkSubscribeRSS Feed
advoss
Quartz | Level 8

I have a large (like 1 billion+ obs) table (actually many quarterly tables) of claims (header (unique claim number) and detail(1-n supporting lines for the unique claim number) combined, ie, 1 header + 5 detail lines yields 5 combined lines).  We have been asked to provide a random sample of claims data from this source.  This means that I need to provide all detail lines for a randomly selected claim.  Is it possible for proc surveyselect to do this in a single pass or do I need to take the sample and go back against the source and retrieve the rest of the lines?

Thank you.

2 REPLIES 2
art297
Opal | Level 21

If it is just a simple random sample of size N, I wouldn't use surveyselect.  I.e., if you know how many claims are in the tables, I would just use one of the random number functions (e.g., randnor) with seed 0 and forcing the result to be an integer value from, say, 1 to the number of claims) and sequentially assign the numbers as your read each claim.

If the claim is assigned a pseudo random number <= N then parse and retrieve the record on the fly.

That way you could do it all in one pass.

Ksharp
Super User

Since you do not post some sample data, It is hard to code.

And i think it is possible for proc surveyselect to do this in a single pass.But I need time

to check documentation.

The following is an example, If you would like.

proc sort data=sashelp.class out=temp(keep=age) nodupkey;
 by age;
run;
proc sort data=sashelp.class;by age;run;
proc surveyselect data=temp method=srs rate=.5 out=sample;run;
data want;
 merge sashelp.class sample(in=in_two);
 by age;
 if in_two;
run;

Ksharp

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1316 views
  • 0 likes
  • 3 in conversation