08-12-2011 03:32 PM
I have a large (like 1 billion+ obs) table (actually many quarterly tables) of claims (header (unique claim number) and detail(1-n supporting lines for the unique claim number) combined, ie, 1 header + 5 detail lines yields 5 combined lines). We have been asked to provide a random sample of claims data from this source. This means that I need to provide all detail lines for a randomly selected claim. Is it possible for proc surveyselect to do this in a single pass or do I need to take the sample and go back against the source and retrieve the rest of the lines?
08-12-2011 05:56 PM
If it is just a simple random sample of size N, I wouldn't use surveyselect. I.e., if you know how many claims are in the tables, I would just use one of the random number functions (e.g., randnor) with seed 0 and forcing the result to be an integer value from, say, 1 to the number of claims) and sequentially assign the numbers as your read each claim.
If the claim is assigned a pseudo random number <= N then parse and retrieve the record on the fly.
That way you could do it all in one pass.
08-16-2011 01:18 AM
Since you do not post some sample data, It is hard to code.
And i think it is possible for proc surveyselect to do this in a single pass.But I need time
to check documentation.
The following is an example, If you would like.
proc sort data=sashelp.class out=temp(keep=age) nodupkey; by age; run; proc sort data=sashelp.class;by age;run; proc surveyselect data=temp method=srs rate=.5 out=sample;run; data want; merge sashelp.class sample(in=in_two); by age; if in_two; run;