I am looking to take a random sample using an identifier. I already isolated the part of the identifier that is random and created a separate variable, but how do I create a sample of the main data by randomly selecting on the identifier?
Ideally this would be replicable, so I want to be able to say - "SAS, randomly select 100,000 records where the identifier = 123"
Any help would be greatly appreciated.
I have looked at proc surveyselect - I am not sure how to specify that the random sample be based on the identifier.
I am trying to take a dataset of 200,000 records all with a unique identifier, and randomly sample 30,000 records based on the identifier. I want to sample the data on identifier, because that is the only variable that contains a random segment, so if I sample on that variable, I can ensure the sample is random.
Please let me know what other specific info you need - I am really at a loss on how to execute this task.
@amrossini wrote:
I have looked at proc surveyselect - I am not sure how to specify that the random sample be based on the identifier.
I am trying to take a dataset of 200,000 records all with a unique identifier, and randomly sample 30,000 records based on the identifier. I want to sample the data on identifier, because that is the only variable that contains a random segment, so if I sample on that variable, I can ensure the sample is random.
Please let me know what other specific info you need - I am really at a loss on how to execute this task.
If your records are unique then just use surveyselect, it does the random selection for you. A specific variable with "random" values is not needed.
Example using a data set you should have available:
proc surveyselect data=sashelp.class sampsize=3 out=selected stats ; run;
data is the set to select from , out has the results, sampsize is the number of desired records (samprate would use percentage of records) and stats adds selection probability and weighting information to the output data (you may not need that). When no sample method is provided then the result is a simple random sample.
The fun part is when you have strata (subpopulations) that you need to select different numbers / rates for the different strata.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.