BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SAS_inquisitive
Lapis Lazuli | Level 10

I am creating creating data sets using poor's man loop approach- calling complied macro with each parameters. Is there a way I can create those data sets dynamically

 

%macro mac (i=,date=);

	proc sql;
	   create table t&i. as
		  select *
			from lib.dat_&date.
	quit;
	run;

%mend mac;

%mac(i=1,date=05FEB16)
1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

Good job on the %DO loop.  Switching over to CALL EXECUTE ...

 

I can give you the basic form of the program, and some of the pitfalls.  But there are a few different formats that you might use inside the DATA step.  You'll have to experiment to see what you prefer.

 

CALL EXECUTE assumes you have a SAS data set that contains the information that you will use to build a program from a DATA step.  Assuming you have a SAS dataset called LIST_OF_DATASETS, and each observation contains MEMNAME holding the name of one of the input data sets ..

 

data _null_;

set list_of_datasets;

call execute('something in here that indicates what you want to do');

run;

 

The intent of CALL EXECUTE is to stack up SAS statements that should run after the current DATA _NULL_ step is over.  However, there are complications.  CALL EXECUTE will immediately execute macro language statements, but will wait until the current DATA _NULL_ step is over in order to execute SAS language statements.  You might program around that by applying a macro language function to suppress the interpretation of the % and & until later.  In this case, I think you can get around that issue as follows:

 

data _null_;

set list_of_datasets;

string = '%test (' || memname || ', ' || put(_n_, 2.) || ')';

call execute(string);

run;

 

The macro %TEST should be designed to process one data set.  Until you are familiar with macro language quoting functions that can delay the execution of macro language statements, %TEST should not contain any %IF/%THEN statements, %DO statements, or even %LET statements.  It should contain SAS code only (but is permitted to reference macro parameters using &).

 

Behind the scenes, %TEST will start to run, generating a PROC SQL.  Since the program is in the middle of a DATA step, that PROC SQL will have to wait until the DATA step finishes in order to run.  Every observation in LIST_OF_DATASETS follows that same path, generating a PROC SQL that has to wait for previous steps to end until it begins executing.

 

It's a new tool, and a very valuable tool.  It's just not easy to present all the details quickly and easily.

 

Good luck.

View solution in original post

14 REPLIES 14
Reeza
Super User

What do you mean by dynamically?

 

Do you want to create a file for each date?

 

This is not considered a good practice or recommended. 

Here's a write up on the common ways to split a file into subsets:

 

http://www.sascommunity.org/wiki/Split_Data_into_Subsets

SAS_inquisitive
Lapis Lazuli | Level 10

@Reeza. Yes. I want to create a data set for each date without having to call macro each time.

SASKiwi
PROC Star

What is the reason why you want to read a dataset once and create a varying number of output datasets based on date?

 

Is the reason to improve performance or something else?

 

I'm asking this because to do this you will have to process the input data twice, once to work out the number of datasets to create, then the second time using macro-generated code for the outputs to each dataset. Also the code will be way more complicated than what you have got now. It will also have to be a DATA step because SQL can only create one dataset at a time.

 

An example of how to do it is in the link @Reeza posted above as the two-pass solution. 

Astounding
PROC Star

The way you have set up the program, each execution of the macro copies an entire data set.  Is that your intention?  Do you already have a permanent location called LIB that contains a separate data set for each date?  

 

How do you know what "each date" means?  Is it every data set that is part of LIB where the name begins with dat_?

SAS_inquisitive
Lapis Lazuli | Level 10

@SASKiwi and @Astounding. Yes. Every data set that is part of LIB where the name begins with dat_ (such as lib.dat_04FEB15, lib.dat_05FEB15 ...). And my intention is not to split those permanent data sets.  I intend to read each of those data sets and perform same operation to get output data set (operation is not shown in the code).  In my program, I  have two parameters one (i=) for naming each output data sets name such t1 for first output dataset, t2 for second output data set and so on. Second parameter (date) for refereeing each permanent data set. I should have different name for this parameter as it is confused with date.  So If I want to read and create another data set, I have to call macro again like below. The second macro call read lib.dat_04FEB15 and create t2 output data set.  May be there is data step alternative to do this.

 

%macro mac (i=,date=);

	proc sql;
	   create table t&i. as
		  select *
			from lib.dat_&date.
	quit;
	run;

%mend mac;

%mac(i=1,date=05FEB16)
%mac(i=2,date=04FEB16)

 

Astounding
PROC Star

OK, I understand that the copying of the data set is only symbolic of the actual processing that will take place.

 

The first step would be to get a list of all the data sets, in automated fashion.  I can't test the program right now, but it would look something like this:

 

proc sql;

create table all_datasets as select memname from dictionary.tables where libname='LIB' and upcase(memname) like 'DAT_%';

quit;

 

That would give you a data set with the MEMNAMEs in it, suitable for CALL EXECUTE.  A very similar approach (suitable for a macro loop) would give you a long macro variable with all the data set names:

 

proc sql;

select memname into : dataset_list separated by ' ' from dictionary.tables where libname='LIB' and upcase(memname) like 'DAT_%';

quit;

 

Either approach is possible  So test the first step, debug it if necessary (and it might be, I'm doing this from memory), and it would be up to you whether you would want a macro loop or CALL EXECUTE as the final step.  For any application that is even mildly complex, I would recommend a macro loop, but we don't really know what your final step actually looks like.

 

SAS_inquisitive
Lapis Lazuli | Level 10

@Astounding. I was able to solve  following way based on your idea. I know call execute technique would have been less code and more efficient. I wonder how call execute alternative look like?

proc sql noprint;
	/*  create table all_datasets as */
	select distinct memname into:DSN separated by ' '
		from dictionary.tables
			where upcase(libname)='lib' and 
				upcase(memname) like 'dat_%';
quit;

run;

%put &DSN;

%macro test;
	%do i=1 %to %sysfunc(countw(&DSN,%str()));

		proc sql;
			create table tmp&i. as
				select *
					from %scan(&DSN,&i,%str())  
						where id in (select distinct id from lib1.dat);
		quit;

		run;

	%end;
%mend test;

%test
Astounding
PROC Star

Good job on the %DO loop.  Switching over to CALL EXECUTE ...

 

I can give you the basic form of the program, and some of the pitfalls.  But there are a few different formats that you might use inside the DATA step.  You'll have to experiment to see what you prefer.

 

CALL EXECUTE assumes you have a SAS data set that contains the information that you will use to build a program from a DATA step.  Assuming you have a SAS dataset called LIST_OF_DATASETS, and each observation contains MEMNAME holding the name of one of the input data sets ..

 

data _null_;

set list_of_datasets;

call execute('something in here that indicates what you want to do');

run;

 

The intent of CALL EXECUTE is to stack up SAS statements that should run after the current DATA _NULL_ step is over.  However, there are complications.  CALL EXECUTE will immediately execute macro language statements, but will wait until the current DATA _NULL_ step is over in order to execute SAS language statements.  You might program around that by applying a macro language function to suppress the interpretation of the % and & until later.  In this case, I think you can get around that issue as follows:

 

data _null_;

set list_of_datasets;

string = '%test (' || memname || ', ' || put(_n_, 2.) || ')';

call execute(string);

run;

 

The macro %TEST should be designed to process one data set.  Until you are familiar with macro language quoting functions that can delay the execution of macro language statements, %TEST should not contain any %IF/%THEN statements, %DO statements, or even %LET statements.  It should contain SAS code only (but is permitted to reference macro parameters using &).

 

Behind the scenes, %TEST will start to run, generating a PROC SQL.  Since the program is in the middle of a DATA step, that PROC SQL will have to wait until the DATA step finishes in order to run.  Every observation in LIST_OF_DATASETS follows that same path, generating a PROC SQL that has to wait for previous steps to end until it begins executing.

 

It's a new tool, and a very valuable tool.  It's just not easy to present all the details quickly and easily.

 

Good luck.

Reeza
Super User
Look into Call execute - data driven macro execution - the example in the documentation is a good one.

Set your macro up so
that your only keeping the final data set, and appending to it on each iteration. Make sure to include a clean-up step to remove intermediate data sets so you don't get unexpected results.
SAS_inquisitive
Lapis Lazuli | Level 10

@Reeza. I don't want to append the output data sets. I want  create one output data set for each input data set.

Reeza
Super User

Why? That's not efficient, or a good way to store data. Unless you have mega data where a day is several million records.  Even then I would consider monthly or weekly snapshots.

 

 

At any rate it doesn't change my answer, execute your macro via call execute. What goes into the macro is ultimately up to you. 

SAS_inquisitive
Lapis Lazuli | Level 10

Okay. I will give a try.

Rick_SAS
SAS Super FREQ

No one explicitly mentioned BY-group processing, so I will. The standard SAS syntax for processing many similar data sets is to keep the data in one big file and then use the BY statement to analyze all the BY groups at once. Or, if you want to limit the data, you can use the WHERE clause to subset the data on the fly.  Most people find this preferable to having zillions of files that they are constantly writing macros for or appending together.

 

In general the BY-group processing is not only simpler, it is faster.  For example, if your computation is part of a simulation or bootstrap study, the BY statement can process the simulated data hundreds of times faster than using a macro loop. See "Simulation in SAS: The slow way or the BY way"

SAS_inquisitive
Lapis Lazuli | Level 10

@Rick_SAS. I will have a  look at those 'by processsing' technique. Thanks !

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 3460 views
  • 2 likes
  • 5 in conversation