DATA Step, Macro, Functions and more

Using a macro to loop through files and process data and output into a destination file

Accepted Solution Solved
Reply
Contributor
Posts: 36
Accepted Solution

Using a macro to loop through files and process data and output into a destination file

Dear friends,

 

Suppose I have many files, following the same structure, divided by Unit and Year (called Unit_Year_Itemized) on the disk. I need to make some calculation for all those files and put the results into one single file (Total_Summary).

So now I have made a macro %Summary to do the calculation. The result is one single line, which can also be dumpted into a temp file (Unit_Year_Summary).

Now I need to use this macro on all the Units and Years, years are numeric and maybe sequential, but units are strings, non sequential. And after the calculation is done, I need to put the results into one single file Total_Summary. For which I made a loop, trying to loop through a list of units, and Years, run the macro, and output the result into the Total_Summary.

Of course, data step failed to recognize without set, and do loops failed. Of course, I can dump the intermediate files on the disk, and use a join or data step to manually put them into a single file. But if there are many files it becomes tidious.

Thanks for any help to solve the problem.

Ken

 

Sample codes:

/* Create testing input files, 2 units, 2 years */

data IT_2014_Itemized;

infile cards delimiter=',';

input Unit $ Year Client $ Sales;

cards;

IT,2014,Adam,100

IT,2014,John,200

IT,2014,Mary,150

;

run;

data IT_2015_Itemized;

infile cards delimiter=',';

input Unit $ Year Client $ Sales;

cards;

IT,2015,Peter,130

IT,2015,John,350

IT,2015,Ted,600

;

run;

data Support_2014_Itemized;

infile cards delimiter=',';

input Unit $ Year Client $ Sales;

cards;

Support,2014,Lou,80

Support,2014,Ann,700

Support,2014,Brian,320

;

run;

data Support_2015_Itemized;

infile cards delimiter=',';

input Unit $ Year Client $ Sales;

cards;

Support,2015,Steve,440

Support,2015,Sam,530

Support,2015,Jack,200

;

run;

 

/* Create macro to calculate for each unit and year, some summary results */

 

%macro Summary(Unit=, Year=);

/* %let Unit=IT; */

/* %do Year=2014 %to 2015;*/  /* I can put a loop here to run all the years, but failed to run for all the units in a list */

proc sql;

/* create table &Unit._&Year._Summary as  */ /* This file is the intermediate summary file which can be dumped out in the disk */

select Unit,

Year,

count(Client) as Total_Client,

sum(Sales) as Total_Sales,

sum(Sales)/count(Client) as Average_Sales

from &Unit._&Year._Itemized

group by Unit, Year;

quit;

/* %end; */

%mend Summary;

 

/* Now callthis macro for each unit and year */

 

%Summary(Unit=IT, Year=2014);

 

/* Now I want to loop through all the units in the list, and the years, run the same macro with Unit and Year, and output the one-line per unit per year result into the Total_Summary file */

 

data Total_Summary;

do Unit=IT, Support; /* This is a list of units, which can be many */

do Year=2014 to 2015; /* Year can be sequential */

%Summary(Unit=&Unit, Year=&Year);

output;

end;

end;

run;


Accepted Solutions
Solution
‎12-14-2016 08:04 AM
Contributor
Posts: 36

Re: Using a macro to loop through files and process data and output into a destination file

Posted in reply to Astounding

Thanks for your suggestion. I moved the target table into the macro! And rewrote the loop code, now it's totally solved!

 

%macro Summary(Unit=, Year=);

/* %do Year=2014 %to 2015;*/

proc sql;

create table &Unit._&Year._Summary as

select Unit,

Year,

count(Client) as Total_Client,

sum(Sales) as Total_Sales,

sum(Sales)/count(Client) as Average_Sales

from &Unit._&Year._Itemized

group by Unit, Year;

quit;

proc append data = &Unit._&Year._Summary

base = Total_Summary /* This is the real target result table, created and appended on the run */

force;

run;

proc datasets lib = WORK memtype = data nolist;

delete &Unit._&Year._Summary;

run;

/* %end; */

%mend Summary;

 

data STEERING_TABLE; /* This table is only the result of the running, can be deleted after run */

length UNIT $12 YEAR 8;

do Unit='IT', 'Support';

do Year=2014 to 2015;

rc=dosubl('%Summary(Unit=' || UNIT || ', Year=' || YEAR || ');');

output;

end;

end;

run;

View solution in original post


All Replies
Super User
Posts: 19,855

Re: Using a macro to loop through files and process data and output into a destination file

Look into CALL EXECUTE() for running the macro multiple times. 

 

In the last step in your macro, append the results to a final table, instead of creating multiple output tables. 

 

Contributor
Posts: 36

Re: Using a macro to loop through files and process data and output into a destination file

Thank you I will check call execute out...

Super User
Super User
Posts: 7,981

Re: Using a macro to loop through files and process data and output into a destination file

Well, this is a common question, and I really don't understand where the mentality comes from to keep smae data in separate datasets.  All this does is that you can't use built in SAS by group processing, and thus have to bash out tons of extra code to emulate this.  Its madness to me.  Put you data together, example:

data total;
  set work.support_: indsname=tmp;
  yr=input(scan(tmp,2,"_"),best.);
run;

Now I have my dataset, with a variable yr which contains the year from the dataset name.  Now I can do that given SQL step using yr in the group by, or I could do proc means by yr; it is so much easy, simple coding.  Basic functionality of SAS!

 

Contributor
Posts: 36

Re: Using a macro to loop through files and process data and output into a destination file

Sure, those files per unit and year they can come in separately from different sources and time, into a directory containing other files. Of course, since they follow the same structure, I can put them into one same file. But then there is still this looping problem...

Super User
Posts: 19,855

Re: Using a macro to loop through files and process data and output into a destination file


ken2 wrote:

Sure, those files per unit and year they can come in separately from different sources and time, into a directory containing other files. Of course, since they follow the same structure, I can put them into one same file. But then there is still this looping problem...


No there isn't. You add another variable to your GROUP BY in SQL to deal with multiple files. Done and done. 

If the files all have the same structure, you can rest all into one file in one data step. 

Super User
Super User
Posts: 7,981

Re: Using a macro to loop through files and process data and output into a destination file

If you get all the data into one dataset, then there is no "looping problem", why would you need to "loop"?  As I posted, you just add the year into the group by and you get a dataset out by year?  It is basic SAS by group processing - a fundamental processing taks in any Base SAS project which should be your number 1 tool to do any kind of data processing?

Contributor
Posts: 36

Re: Using a macro to loop through files and process data and output into a destination file

To get all those data from separate Unit Year files in the disk, into one single data source file, that is a loop.

data source_file;

set file1 file2 .......;

run;

 

if there are many files, this step itself is quite tidious.

Super User
Posts: 19,855

Re: Using a macro to loop through files and process data and output into a destination file

If you assume you need loops....or lets assume you don't. 

Your question should be, how do you read in all files at once. Which is also an FAQ. 

 

http://www.ats.ucla.edu/stat/sas/faq/multi_file_read.htm

 

https://communities.sas.com/t5/SAS-Communities-Library/How-do-I-write-a-macro-to-import-multiple-tex...

 

As well, for a Set statement you can use lists. The following would append all tables from file1 to file200, amd create a variable file_input that would have the name of the source file for each record.  

 

Data combined;

length source file_input $200;

set file1-file200 indsname=source;

 

file_input=source;

 

run; 

 

 

Contributor
Posts: 36

Re: Using a macro to loop through files and process data and output into a destination file

Thanks. I will try it out.

I made up this example, just to have a generic method to access multiple files, execute a common purpose macro, and then output/append the result into a destination file.

Super User
Posts: 19,855

Re: Using a macro to loop through files and process data and output into a destination file

[ Edited ]

If your intent is learning versus solving a business problem, I would recommend reading David Cassells paper, Don't be Loopy. It shows what not to do and why not to loop. 

 

http://www2.sas.com/proceedings/forum2007/183-2007.pdf

Contributor
Posts: 36

Re: Using a macro to loop through files and process data and output into a destination file

Thanks for the paper, I will read through it!

The reason for being so is, in a real production environment, those files can be huge. Also, due to legal requirement, they can not be deleted. Suppose that we lump them all into a single file to avoid the loop, this will double the disk space, and also we will have to append to that single file each month, quarter or year. When the files are so small as in my sample, it is not an issue. Once it is a huge one, then it becomes quite difficult to deal with.

Super User
Super User
Posts: 7,981

Re: Using a macro to loop through files and process data and output into a destination file

Then you would be better off looking into proper data warehousing/database software to deal with big data.  It really shouldn't matter to a program whether the data is one big file or lots of small files, it still takes the same processing, there are some gains and losses on both sides of that equation.

If you do have to go down this route, then simplfy the code to:

data _null_;
  set sashelp.vtable (where=(libname="<your libname>" and substr(memname,1,7)="Support"));
  call execute('proc sql;
                  create table WANT as
                  select * 
                  from    <your libname>.'||strip(memname)||';
                quit;');
run;

The where brings in all Support... files in <your libname>, and then for each of these generates out a proc sql step.  

Super User
Posts: 5,516

Re: Using a macro to loop through files and process data and output into a destination file

Perhaps best of all, you can accomplish this with small changes to the existing macro.

 

First, make sure YEAR and UNIT are part of each summary data set.

 

Second, add to the end of the macro:

 

proc append data=&Unit._&Year._Summary base=Total_Summary;

run;

 

Done!  In fact, this approach lets you simplify.  You can re-use the same name for all summary data sets.

Contributor
Posts: 36

Re: Using a macro to loop through files and process data and output into a destination file

Posted in reply to Astounding

Very nice! Thanks!

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 16 replies
  • 586 views
  • 0 likes
  • 5 in conversation