@amrossini wrote:
I have looked at proc surveyselect - I am not sure how to specify that the random sample be based on the identifier.
I am trying to take a dataset of 200,000 records all with a unique identifier, and randomly sample 30,000 records based on the identifier. I want to sample the data on identifier, because that is the only variable that contains a random segment, so if I sample on that variable, I can ensure the sample is random.
Please let me know what other specific info you need - I am really at a loss on how to execute this task.
If your records are unique then just use surveyselect, it does the random selection for you. A specific variable with "random" values is not needed.
Example using a data set you should have available:
proc surveyselect data=sashelp.class
sampsize=3
out=selected
stats
;
run;
data is the set to select from , out has the results, sampsize is the number of desired records (samprate would use percentage of records) and stats adds selection probability and weighting information to the output data (you may not need that). When no sample method is provided then the result is a simple random sample.
The fun part is when you have strata (subpopulations) that you need to select different numbers / rates for the different strata.
... View more