BookmarkSubscribeRSS Feed
UPRETIGOPI
Obsidian | Level 7

PROBLEM STATEMENT

 

 I have 4 large data tables from which I have to create a random sample of certain sizes from each data table. Each data table has a field called COMMAND.  The values for the COMMAND field are as follows: DC, TX, FL and GL. Each data table has only one command value eg either DC or TX or FL or GL. Each data table has more than 5 million records.  The sample sizes are as follows for each COMMAND:

 

Data Table     Command           Sample size

Table_1             DC                      1000

Table_2             TX                       1500

Table_3             FL                        2000

Table_4             GL                      1400

 

What I want to do is a MACRO that creates a series of random samples using PROC SURVEYSELECT from those 4  data tables based on the values of COMMAND variable in those tables.

 

I need to create a random sample for each of these command (DC, TX,FL and GL) with the sample size of 1000, 1500,2000, and 1400 respectively as shown ABOVE. I need a macro that can do this using PROC SURVEYSELECT . What I want is, a random sample of 1000 for DC, 1500 for TX, 2000 for FL and 1400 for GL from those 4 tables . For example, if the command field DC has 4 million records in table_1, I want to create a random sample size of 1000 for DC from that table (table_1) and likewise, 1500 random sample for TX (another table, table_2) and likewise for all(table_3 and table_4). Please help me on this. I really appreciate the kind assistance.

 

 I want to use the following code for random sampling OR ANOTHER CODE IF YOU WANT TO SUGGEST

 

PROC SURVEYSELECT DATA=ERP_MASTER

                                METHOD=SRS

                              REP=1

                              SAMPSIZE=&samplesize.

                              OUT= &COMMAND._SRS;

                              ID _ALL_;

 

 

I hope this helpS you understand the problem STATEMENT. I highly appreciate your kind assistance. Thanks

 

Gopi Upreti

3 REPLIES 3
PGStats
Opal | Level 21

You can do this with call execute:

 

data _null_;
length line $200; input Table $ Command $ SampleSize; line = catt("proc surveyselect data=",Table," sampsize=",SampleSize," out=",Command,"_SRS; run;"); call execute(line); datalines; Table_1 DC 1000 Table_2 TX 1500 Table_3 FL 2000 Table_4 GL 1400 ;
PG
Astounding
PROC Star

I think CALL EXECUTE is the best tool for the job.  Assuming that your program is basically working the way you want, here is how you could automate:

 

data _null_;

set commands;

call execute('PROC SURVEYSELECT DATA=');

call execute(data_table);

call execute('ERP_MASTER

                                METHOD=SRS

                              REP=1

                              SAMPSIZE=');

call execute(put(samplesize, 4.);

call execute('OUT=');

call execute(COMMAND);

call execute('_SRS;  ID _ALL_; run;');

run;

 

Theoretically, you could put everything together in a single CALL EXECUTE, but I think this is easier to read.

 

There are details to attend to ... always using 4 digits as the sample size (hope that's enough but you can increase it if needed), no trailing blanks allowed at the end of COMMAND.

 

I assumed you wanted to add the RUN statements.

 

Good luck.

 

ballardw
Super User

Just in case you had gone to the work of segregating the data just for this step, you could have used STRAT=command and a list for sampsize parameter. Just make sure that the order of the list on the sampsize matches your strata order.

 

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 772 views
  • 0 likes
  • 4 in conversation