DATA Step, Macro, Functions and more

Create series of random samples from a Master table

Occasional Contributor
Posts: 12

Create series of random samples from a Master table



 I have 4 large data tables from which I have to create a random sample of certain sizes from each data table. Each data table has a field called COMMAND.  The values for the COMMAND field are as follows: DC, TX, FL and GL. Each data table has only one command value eg either DC or TX or FL or GL. Each data table has more than 5 million records.  The sample sizes are as follows for each COMMAND:


Data Table     Command           Sample size

Table_1             DC                      1000

Table_2             TX                       1500

Table_3             FL                        2000

Table_4             GL                      1400


What I want to do is a MACRO that creates a series of random samples using PROC SURVEYSELECT from those 4  data tables based on the values of COMMAND variable in those tables.


I need to create a random sample for each of these command (DC, TX,FL and GL) with the sample size of 1000, 1500,2000, and 1400 respectively as shown ABOVE. I need a macro that can do this using PROC SURVEYSELECT . What I want is, a random sample of 1000 for DC, 1500 for TX, 2000 for FL and 1400 for GL from those 4 tables . For example, if the command field DC has 4 million records in table_1, I want to create a random sample size of 1000 for DC from that table (table_1) and likewise, 1500 random sample for TX (another table, table_2) and likewise for all(table_3 and table_4). Please help me on this. I really appreciate the kind assistance.


 I want to use the following code for random sampling OR ANOTHER CODE IF YOU WANT TO SUGGEST






                              OUT= &COMMAND._SRS;

                              ID _ALL_;



I hope this helpS you understand the problem STATEMENT. I highly appreciate your kind assistance. Thanks


Gopi Upreti

Esteemed Advisor
Posts: 5,526

Re: Create series of random samples from a Master table

[ Edited ]
Posted in reply to UPRETIGOPI

You can do this with call execute:


data _null_;
length line $200; input Table $ Command $ SampleSize; line = catt("proc surveyselect data=",Table," sampsize=",SampleSize," out=",Command,"_SRS; run;"); call execute(line); datalines; Table_1 DC 1000 Table_2 TX 1500 Table_3 FL 2000 Table_4 GL 1400 ;
Super User
Posts: 6,765

Re: Create series of random samples from a Master table

Posted in reply to UPRETIGOPI

I think CALL EXECUTE is the best tool for the job.  Assuming that your program is basically working the way you want, here is how you could automate:


data _null_;

set commands;

call execute('PROC SURVEYSELECT DATA=');

call execute(data_table);

call execute('ERP_MASTER




call execute(put(samplesize, 4.);

call execute('OUT=');

call execute(COMMAND);

call execute('_SRS;  ID _ALL_; run;');



Theoretically, you could put everything together in a single CALL EXECUTE, but I think this is easier to read.


There are details to attend to ... always using 4 digits as the sample size (hope that's enough but you can increase it if needed), no trailing blanks allowed at the end of COMMAND.


I assumed you wanted to add the RUN statements.


Good luck.


Super User
Posts: 13,523

Re: Create series of random samples from a Master table

Posted in reply to UPRETIGOPI

Just in case you had gone to the work of segregating the data just for this step, you could have used STRAT=command and a list for sampsize parameter. Just make sure that the order of the list on the sampsize matches your strata order.



Ask a Question
Discussion stats
  • 3 replies
  • 4 in conversation