DATA Step, Macro, Functions and more

Dynamically create data sets

Accepted Solution Solved
Reply
Super Contributor
Posts: 271
Accepted Solution

Dynamically create data sets

[ Edited ]

I am creating creating data sets using poor's man loop approach- calling complied macro with each parameters. Is there a way I can create those data sets dynamically

 

%macro mac (i=,date=);

	proc sql;
	   create table t&i. as
		  select *
			from lib.dat_&date.
	quit;
	run;

%mend mac;

%mac(i=1,date=05FEB16)

Accepted Solutions
Solution
‎02-08-2016 11:36 AM
Super User
Posts: 5,513

Re: Dynamically create data sets

Posted in reply to SAS_inquisitive

Good job on the %DO loop.  Switching over to CALL EXECUTE ...

 

I can give you the basic form of the program, and some of the pitfalls.  But there are a few different formats that you might use inside the DATA step.  You'll have to experiment to see what you prefer.

 

CALL EXECUTE assumes you have a SAS data set that contains the information that you will use to build a program from a DATA step.  Assuming you have a SAS dataset called LIST_OF_DATASETS, and each observation contains MEMNAME holding the name of one of the input data sets ..

 

data _null_;

set list_of_datasets;

call execute('something in here that indicates what you want to do');

run;

 

The intent of CALL EXECUTE is to stack up SAS statements that should run after the current DATA _NULL_ step is over.  However, there are complications.  CALL EXECUTE will immediately execute macro language statements, but will wait until the current DATA _NULL_ step is over in order to execute SAS language statements.  You might program around that by applying a macro language function to suppress the interpretation of the % and & until later.  In this case, I think you can get around that issue as follows:

 

data _null_;

set list_of_datasets;

string = '%test (' || memname || ', ' || put(_n_, 2.) || ')';

call execute(string);

run;

 

The macro %TEST should be designed to process one data set.  Until you are familiar with macro language quoting functions that can delay the execution of macro language statements, %TEST should not contain any %IF/%THEN statements, %DO statements, or even %LET statements.  It should contain SAS code only (but is permitted to reference macro parameters using &).

 

Behind the scenes, %TEST will start to run, generating a PROC SQL.  Since the program is in the middle of a DATA step, that PROC SQL will have to wait until the DATA step finishes in order to run.  Every observation in LIST_OF_DATASETS follows that same path, generating a PROC SQL that has to wait for previous steps to end until it begins executing.

 

It's a new tool, and a very valuable tool.  It's just not easy to present all the details quickly and easily.

 

Good luck.

View solution in original post


All Replies
Super User
Posts: 19,822

Re: Dynamically create data sets

Posted in reply to SAS_inquisitive

What do you mean by dynamically?

 

Do you want to create a file for each date?

 

This is not considered a good practice or recommended. 

Here's a write up on the common ways to split a file into subsets:

 

http://www.sascommunity.org/wiki/Split_Data_into_Subsets

Super Contributor
Posts: 271

Re: Dynamically create data sets

@Reeza. Yes. I want to create a data set for each date without having to call macro each time.

Super User
Posts: 3,256

Re: Dynamically create data sets

[ Edited ]
Posted in reply to SAS_inquisitive

What is the reason why you want to read a dataset once and create a varying number of output datasets based on date?

 

Is the reason to improve performance or something else?

 

I'm asking this because to do this you will have to process the input data twice, once to work out the number of datasets to create, then the second time using macro-generated code for the outputs to each dataset. Also the code will be way more complicated than what you have got now. It will also have to be a DATA step because SQL can only create one dataset at a time.

 

An example of how to do it is in the link @Reeza posted above as the two-pass solution. 

Super User
Posts: 5,513

Re: Dynamically create data sets

Posted in reply to SAS_inquisitive

The way you have set up the program, each execution of the macro copies an entire data set.  Is that your intention?  Do you already have a permanent location called LIB that contains a separate data set for each date?  

 

How do you know what "each date" means?  Is it every data set that is part of LIB where the name begins with dat_?

Super Contributor
Posts: 271

Re: Dynamically create data sets

[ Edited ]
Posted in reply to Astounding

@SASKiwi and @Astounding. Yes. Every data set that is part of LIB where the name begins with dat_ (such as lib.dat_04FEB15, lib.dat_05FEB15 ...). And my intention is not to split those permanent data sets.  I intend to read each of those data sets and perform same operation to get output data set (operation is not shown in the code).  In my program, I  have two parameters one (i=) for naming each output data sets name such t1 for first output dataset, t2 for second output data set and so on. Second parameter (date) for refereeing each permanent data set. I should have different name for this parameter as it is confused with date.  So If I want to read and create another data set, I have to call macro again like below. The second macro call read lib.dat_04FEB15 and create t2 output data set.  May be there is data step alternative to do this.

 

%macro mac (i=,date=);

	proc sql;
	   create table t&i. as
		  select *
			from lib.dat_&date.
	quit;
	run;

%mend mac;

%mac(i=1,date=05FEB16)
%mac(i=2,date=04FEB16)

 

Super User
Posts: 5,513

Re: Dynamically create data sets

Posted in reply to SAS_inquisitive

OK, I understand that the copying of the data set is only symbolic of the actual processing that will take place.

 

The first step would be to get a list of all the data sets, in automated fashion.  I can't test the program right now, but it would look something like this:

 

proc sql;

create table all_datasets as select memname from dictionary.tables where libname='LIB' and upcase(memname) like 'DAT_%';

quit;

 

That would give you a data set with the MEMNAMEs in it, suitable for CALL EXECUTE.  A very similar approach (suitable for a macro loop) would give you a long macro variable with all the data set names:

 

proc sql;

select memname into : dataset_list separated by ' ' from dictionary.tables where libname='LIB' and upcase(memname) like 'DAT_%';

quit;

 

Either approach is possible  So test the first step, debug it if necessary (and it might be, I'm doing this from memory), and it would be up to you whether you would want a macro loop or CALL EXECUTE as the final step.  For any application that is even mildly complex, I would recommend a macro loop, but we don't really know what your final step actually looks like.

 

Super Contributor
Posts: 271

Re: Dynamically create data sets

Posted in reply to Astounding

@Astounding. I was able to solve  following way based on your idea. I know call execute technique would have been less code and more efficient. I wonder how call execute alternative look like?

proc sql noprint;
	/*  create table all_datasets as */
	select distinct memname into:DSN separated by ' '
		from dictionary.tables
			where upcase(libname)='lib' and 
				upcase(memname) like 'dat_%';
quit;

run;

%put &DSN;

%macro test;
	%do i=1 %to %sysfunc(countw(&DSN,%str()));

		proc sql;
			create table tmp&i. as
				select *
					from %scan(&DSN,&i,%str())  
						where id in (select distinct id from lib1.dat);
		quit;

		run;

	%end;
%mend test;

%test
Solution
‎02-08-2016 11:36 AM
Super User
Posts: 5,513

Re: Dynamically create data sets

Posted in reply to SAS_inquisitive

Good job on the %DO loop.  Switching over to CALL EXECUTE ...

 

I can give you the basic form of the program, and some of the pitfalls.  But there are a few different formats that you might use inside the DATA step.  You'll have to experiment to see what you prefer.

 

CALL EXECUTE assumes you have a SAS data set that contains the information that you will use to build a program from a DATA step.  Assuming you have a SAS dataset called LIST_OF_DATASETS, and each observation contains MEMNAME holding the name of one of the input data sets ..

 

data _null_;

set list_of_datasets;

call execute('something in here that indicates what you want to do');

run;

 

The intent of CALL EXECUTE is to stack up SAS statements that should run after the current DATA _NULL_ step is over.  However, there are complications.  CALL EXECUTE will immediately execute macro language statements, but will wait until the current DATA _NULL_ step is over in order to execute SAS language statements.  You might program around that by applying a macro language function to suppress the interpretation of the % and & until later.  In this case, I think you can get around that issue as follows:

 

data _null_;

set list_of_datasets;

string = '%test (' || memname || ', ' || put(_n_, 2.) || ')';

call execute(string);

run;

 

The macro %TEST should be designed to process one data set.  Until you are familiar with macro language quoting functions that can delay the execution of macro language statements, %TEST should not contain any %IF/%THEN statements, %DO statements, or even %LET statements.  It should contain SAS code only (but is permitted to reference macro parameters using &).

 

Behind the scenes, %TEST will start to run, generating a PROC SQL.  Since the program is in the middle of a DATA step, that PROC SQL will have to wait until the DATA step finishes in order to run.  Every observation in LIST_OF_DATASETS follows that same path, generating a PROC SQL that has to wait for previous steps to end until it begins executing.

 

It's a new tool, and a very valuable tool.  It's just not easy to present all the details quickly and easily.

 

Good luck.

Super User
Posts: 19,822

Re: Dynamically create data sets

Posted in reply to SAS_inquisitive
Look into Call execute - data driven macro execution - the example in the documentation is a good one.

Set your macro up so
that your only keeping the final data set, and appending to it on each iteration. Make sure to include a clean-up step to remove intermediate data sets so you don't get unexpected results.
Super Contributor
Posts: 271

Re: Dynamically create data sets

@Reeza. I don't want to append the output data sets. I want  create one output data set for each input data set.

Super User
Posts: 19,822

Re: Dynamically create data sets

Posted in reply to SAS_inquisitive

Why? That's not efficient, or a good way to store data. Unless you have mega data where a day is several million records.  Even then I would consider monthly or weekly snapshots.

 

 

At any rate it doesn't change my answer, execute your macro via call execute. What goes into the macro is ultimately up to you. 

Super Contributor
Posts: 271

Re: Dynamically create data sets

Okay. I will give a try.

SAS Super FREQ
Posts: 3,754

Re: Dynamically create data sets

Posted in reply to SAS_inquisitive

No one explicitly mentioned BY-group processing, so I will. The standard SAS syntax for processing many similar data sets is to keep the data in one big file and then use the BY statement to analyze all the BY groups at once. Or, if you want to limit the data, you can use the WHERE clause to subset the data on the fly.  Most people find this preferable to having zillions of files that they are constantly writing macros for or appending together.

 

In general the BY-group processing is not only simpler, it is faster.  For example, if your computation is part of a simulation or bootstrap study, the BY statement can process the simulated data hundreds of times faster than using a macro loop. See "Simulation in SAS: The slow way or the BY way"

Super Contributor
Posts: 271

Re: Dynamically create data sets

@Rick_SAS. I will have a  look at those 'by processsing' technique. Thanks !

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 14 replies
  • 480 views
  • 2 likes
  • 5 in conversation