BookmarkSubscribeRSS Feed
amrossini
Calcite | Level 5

I am looking to take a random sample using an identifier. I already isolated the part of the identifier that is random and created a separate variable, but how do I create a sample of the main data by randomly selecting on the identifier? 

 

Ideally this would be replicable, so I want to be able to say - "SAS, randomly select 100,000 records where the identifier = 123" 

 

Any help would be greatly appreciated. 

4 REPLIES 4
Reeza
Super User
Its not quite clear what you're trying to do, but you can start by looking at PROC SURVEYSELECT.
amrossini
Calcite | Level 5

I have looked at proc surveyselect - I am not sure how to specify that the random sample be based on the identifier. 

 

I am trying to take a dataset of 200,000 records all with a unique identifier, and randomly sample 30,000 records based on the identifier. I want to sample the data on identifier, because that is the only variable that contains a random segment, so if I sample on that variable, I can ensure the sample is random. 

 

Please let me know what other specific info you need - I am really at a loss on how to execute this task. 

Reeza
Super User
Make a small reproducible sample then that illustrates what you're trying to do. Show what you have as a starting point, the logic and what you need as output. I'm not clear on your logic, perhaps others may be.
ballardw
Super User

@amrossini wrote:

I have looked at proc surveyselect - I am not sure how to specify that the random sample be based on the identifier. 

 

I am trying to take a dataset of 200,000 records all with a unique identifier, and randomly sample 30,000 records based on the identifier. I want to sample the data on identifier, because that is the only variable that contains a random segment, so if I sample on that variable, I can ensure the sample is random. 

 

Please let me know what other specific info you need - I am really at a loss on how to execute this task. 


If your records are unique then just use surveyselect, it does the random selection for you. A specific variable with "random" values is not needed.

Example using a data set you should have available:

proc surveyselect data=sashelp.class
     sampsize=3
     out=selected
     stats
;
run;

data is the set to select from , out has the results, sampsize is the number of desired records (samprate would use percentage of records) and stats adds selection probability and weighting information to the output data (you may not need that). When no sample method is provided then the result is a simple random sample.

 

 

The fun part is when you have strata (subpopulations) that you need to select different numbers / rates for the different strata.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1010 views
  • 2 likes
  • 3 in conversation