BookmarkSubscribeRSS Feed
Obsidian | Level 7



 I have 4 large data tables from which I have to create a random sample of certain sizes from each data table. Each data table has a field called COMMAND.  The values for the COMMAND field are as follows: DC, TX, FL and GL. Each data table has only one command value eg either DC or TX or FL or GL. Each data table has more than 5 million records.  The sample sizes are as follows for each COMMAND:


Data Table     Command           Sample size

Table_1             DC                      1000

Table_2             TX                       1500

Table_3             FL                        2000

Table_4             GL                      1400


What I want to do is a MACRO that creates a series of random samples using PROC SURVEYSELECT from those 4  data tables based on the values of COMMAND variable in those tables.


I need to create a random sample for each of these command (DC, TX,FL and GL) with the sample size of 1000, 1500,2000, and 1400 respectively as shown ABOVE. I need a macro that can do this using PROC SURVEYSELECT . What I want is, a random sample of 1000 for DC, 1500 for TX, 2000 for FL and 1400 for GL from those 4 tables . For example, if the command field DC has 4 million records in table_1, I want to create a random sample size of 1000 for DC from that table (table_1) and likewise, 1500 random sample for TX (another table, table_2) and likewise for all(table_3 and table_4). Please help me on this. I really appreciate the kind assistance.


 I want to use the following code for random sampling OR ANOTHER CODE IF YOU WANT TO SUGGEST






                              OUT= &COMMAND._SRS;

                              ID _ALL_;



I hope this helpS you understand the problem STATEMENT. I highly appreciate your kind assistance. Thanks


Gopi Upreti

Opal | Level 21

You can do this with call execute:


data _null_;
length line $200; input Table $ Command $ SampleSize; line = catt("proc surveyselect data=",Table," sampsize=",SampleSize," out=",Command,"_SRS; run;"); call execute(line); datalines; Table_1 DC 1000 Table_2 TX 1500 Table_3 FL 2000 Table_4 GL 1400 ;

I think CALL EXECUTE is the best tool for the job.  Assuming that your program is basically working the way you want, here is how you could automate:


data _null_;

set commands;

call execute('PROC SURVEYSELECT DATA=');

call execute(data_table);

call execute('ERP_MASTER




call execute(put(samplesize, 4.);

call execute('OUT=');

call execute(COMMAND);

call execute('_SRS;  ID _ALL_; run;');



Theoretically, you could put everything together in a single CALL EXECUTE, but I think this is easier to read.


There are details to attend to ... always using 4 digits as the sample size (hope that's enough but you can increase it if needed), no trailing blanks allowed at the end of COMMAND.


I assumed you wanted to add the RUN statements.


Good luck.


Super User

Just in case you had gone to the work of segregating the data just for this step, you could have used STRAT=command and a list for sampsize parameter. Just make sure that the order of the list on the sampsize matches your strata order.



Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 4 in conversation