Obsidian | Level 7

## Create series of random samples from a Master table

PROBLEM STATEMENT

I have 4 large data tables from which I have to create a random sample of certain sizes from each data table. Each data table has a field called COMMAND.  The values for the COMMAND field are as follows: DC, TX, FL and GL. Each data table has only one command value eg either DC or TX or FL or GL. Each data table has more than 5 million records.  The sample sizes are as follows for each COMMAND:

Data Table     Command           Sample size

Table_1             DC                      1000

Table_2             TX                       1500

Table_3             FL                        2000

Table_4             GL                      1400

What I want to do is a MACRO that creates a series of random samples using PROC SURVEYSELECT from those 4  data tables based on the values of COMMAND variable in those tables.

I need to create a random sample for each of these command (DC, TX,FL and GL) with the sample size of 1000, 1500,2000, and 1400 respectively as shown ABOVE. I need a macro that can do this using PROC SURVEYSELECT . What I want is, a random sample of 1000 for DC, 1500 for TX, 2000 for FL and 1400 for GL from those 4 tables . For example, if the command field DC has 4 million records in table_1, I want to create a random sample size of 1000 for DC from that table (table_1) and likewise, 1500 random sample for TX (another table, table_2) and likewise for all(table_3 and table_4). Please help me on this. I really appreciate the kind assistance.

I want to use the following code for random sampling OR ANOTHER CODE IF YOU WANT TO SUGGEST

PROC SURVEYSELECT DATA=ERP_MASTER

METHOD=SRS

REP=1

SAMPSIZE=&samplesize.

OUT= &COMMAND._SRS;

ID _ALL_;

I hope this helpS you understand the problem STATEMENT. I highly appreciate your kind assistance. Thanks

Gopi Upreti

3 REPLIES 3
Opal | Level 21

## Re: Create series of random samples from a Master table

You can do this with call execute:

``````data _null_;length line \$200;
input Table \$ Command \$ SampleSize;
line = catt("proc surveyselect data=",Table," sampsize=",SampleSize," out=",Command,"_SRS; run;");
call execute(line);
datalines;
Table_1  DC  1000
Table_2  TX  1500
Table_3  FL  2000
Table_4  GL  1400
;``````
PG
PROC Star

## Re: Create series of random samples from a Master table

I think CALL EXECUTE is the best tool for the job.  Assuming that your program is basically working the way you want, here is how you could automate:

data _null_;

set commands;

call execute('PROC SURVEYSELECT DATA=');

call execute(data_table);

call execute('ERP_MASTER

METHOD=SRS

REP=1

SAMPSIZE=');

call execute(put(samplesize, 4.);

call execute('OUT=');

call execute(COMMAND);

call execute('_SRS;  ID _ALL_; run;');

run;

Theoretically, you could put everything together in a single CALL EXECUTE, but I think this is easier to read.

There are details to attend to ... always using 4 digits as the sample size (hope that's enough but you can increase it if needed), no trailing blanks allowed at the end of COMMAND.

I assumed you wanted to add the RUN statements.

Good luck.

Super User

## Re: Create series of random samples from a Master table

Just in case you had gone to the work of segregating the data just for this step, you could have used STRAT=command and a list for sampsize parameter. Just make sure that the order of the list on the sampsize matches your strata order.

Discussion stats
• 3 replies
• 870 views
• 0 likes
• 4 in conversation