DATA Step, Macro, Functions and more

Create series of random samples from a Master table

Reply
Occasional Contributor
Posts: 10

Create series of random samples from a Master table

PROBLEM STATEMENT

 

 I have 4 large data tables from which I have to create a random sample of certain sizes from each data table. Each data table has a field called COMMAND.  The values for the COMMAND field are as follows: DC, TX, FL and GL. Each data table has only one command value eg either DC or TX or FL or GL. Each data table has more than 5 million records.  The sample sizes are as follows for each COMMAND:

 

Data Table     Command           Sample size

Table_1             DC                      1000

Table_2             TX                       1500

Table_3             FL                        2000

Table_4             GL                      1400

 

What I want to do is a MACRO that creates a series of random samples using PROC SURVEYSELECT from those 4  data tables based on the values of COMMAND variable in those tables.

 

I need to create a random sample for each of these command (DC, TX,FL and GL) with the sample size of 1000, 1500,2000, and 1400 respectively as shown ABOVE. I need a macro that can do this using PROC SURVEYSELECT . What I want is, a random sample of 1000 for DC, 1500 for TX, 2000 for FL and 1400 for GL from those 4 tables . For example, if the command field DC has 4 million records in table_1, I want to create a random sample size of 1000 for DC from that table (table_1) and likewise, 1500 random sample for TX (another table, table_2) and likewise for all(table_3 and table_4). Please help me on this. I really appreciate the kind assistance.

 

 I want to use the following code for random sampling OR ANOTHER CODE IF YOU WANT TO SUGGEST

 

PROC SURVEYSELECT DATA=ERP_MASTER

                                METHOD=SRS

                              REP=1

                              SAMPSIZE=&samplesize.

                              OUT= &COMMAND._SRS;

                              ID _ALL_;

 

 

I hope this helpS you understand the problem STATEMENT. I highly appreciate your kind assistance. Thanks

 

Gopi Upreti

Respected Advisor
Posts: 4,655

Re: Create series of random samples from a Master table

[ Edited ]

You can do this with call execute:

 

data _null_;
length line $200; input Table $ Command $ SampleSize; line = catt("proc surveyselect data=",Table," sampsize=",SampleSize," out=",Command,"_SRS; run;"); call execute(line); datalines; Table_1 DC 1000 Table_2 TX 1500 Table_3 FL 2000 Table_4 GL 1400 ;
PG
Super User
Posts: 5,092

Re: Create series of random samples from a Master table

I think CALL EXECUTE is the best tool for the job.  Assuming that your program is basically working the way you want, here is how you could automate:

 

data _null_;

set commands;

call execute('PROC SURVEYSELECT DATA=');

call execute(data_table);

call execute('ERP_MASTER

                                METHOD=SRS

                              REP=1

                              SAMPSIZE=');

call execute(put(samplesize, 4.);

call execute('OUT=');

call execute(COMMAND);

call execute('_SRS;  ID _ALL_; run;');

run;

 

Theoretically, you could put everything together in a single CALL EXECUTE, but I think this is easier to read.

 

There are details to attend to ... always using 4 digits as the sample size (hope that's enough but you can increase it if needed), no trailing blanks allowed at the end of COMMAND.

 

I assumed you wanted to add the RUN statements.

 

Good luck.

 

Super User
Posts: 10,535

Re: Create series of random samples from a Master table

Just in case you had gone to the work of segregating the data just for this step, you could have used STRAT=command and a list for sampsize parameter. Just make sure that the order of the list on the sampsize matches your strata order.

 

 

Ask a Question
Discussion stats
  • 3 replies
  • 173 views
  • 0 likes
  • 4 in conversation