BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ken2
Obsidian | Level 7

Dear friends,

 

Suppose I have many files, following the same structure, divided by Unit and Year (called Unit_Year_Itemized) on the disk. I need to make some calculation for all those files and put the results into one single file (Total_Summary).

So now I have made a macro %Summary to do the calculation. The result is one single line, which can also be dumpted into a temp file (Unit_Year_Summary).

Now I need to use this macro on all the Units and Years, years are numeric and maybe sequential, but units are strings, non sequential. And after the calculation is done, I need to put the results into one single file Total_Summary. For which I made a loop, trying to loop through a list of units, and Years, run the macro, and output the result into the Total_Summary.

Of course, data step failed to recognize without set, and do loops failed. Of course, I can dump the intermediate files on the disk, and use a join or data step to manually put them into a single file. But if there are many files it becomes tidious.

Thanks for any help to solve the problem.

Ken

 

Sample codes:

/* Create testing input files, 2 units, 2 years */

data IT_2014_Itemized;

infile cards delimiter=',';

input Unit $ Year Client $ Sales;

cards;

IT,2014,Adam,100

IT,2014,John,200

IT,2014,Mary,150

;

run;

data IT_2015_Itemized;

infile cards delimiter=',';

input Unit $ Year Client $ Sales;

cards;

IT,2015,Peter,130

IT,2015,John,350

IT,2015,Ted,600

;

run;

data Support_2014_Itemized;

infile cards delimiter=',';

input Unit $ Year Client $ Sales;

cards;

Support,2014,Lou,80

Support,2014,Ann,700

Support,2014,Brian,320

;

run;

data Support_2015_Itemized;

infile cards delimiter=',';

input Unit $ Year Client $ Sales;

cards;

Support,2015,Steve,440

Support,2015,Sam,530

Support,2015,Jack,200

;

run;

 

/* Create macro to calculate for each unit and year, some summary results */

 

%macro Summary(Unit=, Year=);

/* %let Unit=IT; */

/* %do Year=2014 %to 2015;*/  /* I can put a loop here to run all the years, but failed to run for all the units in a list */

proc sql;

/* create table &Unit._&Year._Summary as  */ /* This file is the intermediate summary file which can be dumped out in the disk */

select Unit,

Year,

count(Client) as Total_Client,

sum(Sales) as Total_Sales,

sum(Sales)/count(Client) as Average_Sales

from &Unit._&Year._Itemized

group by Unit, Year;

quit;

/* %end; */

%mend Summary;

 

/* Now callthis macro for each unit and year */

 

%Summary(Unit=IT, Year=2014);

 

/* Now I want to loop through all the units in the list, and the years, run the same macro with Unit and Year, and output the one-line per unit per year result into the Total_Summary file */

 

data Total_Summary;

do Unit=IT, Support; /* This is a list of units, which can be many */

do Year=2014 to 2015; /* Year can be sequential */

%Summary(Unit=&Unit, Year=&Year);

output;

end;

end;

run;

1 ACCEPTED SOLUTION

Accepted Solutions
ken2
Obsidian | Level 7

Thanks for your suggestion. I moved the target table into the macro! And rewrote the loop code, now it's totally solved!

 

%macro Summary(Unit=, Year=);

/* %do Year=2014 %to 2015;*/

proc sql;

create table &Unit._&Year._Summary as

select Unit,

Year,

count(Client) as Total_Client,

sum(Sales) as Total_Sales,

sum(Sales)/count(Client) as Average_Sales

from &Unit._&Year._Itemized

group by Unit, Year;

quit;

proc append data = &Unit._&Year._Summary

base = Total_Summary /* This is the real target result table, created and appended on the run */

force;

run;

proc datasets lib = WORK memtype = data nolist;

delete &Unit._&Year._Summary;

run;

/* %end; */

%mend Summary;

 

data STEERING_TABLE; /* This table is only the result of the running, can be deleted after run */

length UNIT $12 YEAR 8;

do Unit='IT', 'Support';

do Year=2014 to 2015;

rc=dosubl('%Summary(Unit=' || UNIT || ', Year=' || YEAR || ');');

output;

end;

end;

run;

View solution in original post

16 REPLIES 16
Reeza
Super User

Look into CALL EXECUTE() for running the macro multiple times. 

 

In the last step in your macro, append the results to a final table, instead of creating multiple output tables. 

 

ken2
Obsidian | Level 7

Thank you I will check call execute out...

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, this is a common question, and I really don't understand where the mentality comes from to keep smae data in separate datasets.  All this does is that you can't use built in SAS by group processing, and thus have to bash out tons of extra code to emulate this.  Its madness to me.  Put you data together, example:

data total;
  set work.support_: indsname=tmp;
  yr=input(scan(tmp,2,"_"),best.);
run;

Now I have my dataset, with a variable yr which contains the year from the dataset name.  Now I can do that given SQL step using yr in the group by, or I could do proc means by yr; it is so much easy, simple coding.  Basic functionality of SAS!

 

ken2
Obsidian | Level 7

Sure, those files per unit and year they can come in separately from different sources and time, into a directory containing other files. Of course, since they follow the same structure, I can put them into one same file. But then there is still this looping problem...

Reeza
Super User

@ken2 wrote:

Sure, those files per unit and year they can come in separately from different sources and time, into a directory containing other files. Of course, since they follow the same structure, I can put them into one same file. But then there is still this looping problem...


No there isn't. You add another variable to your GROUP BY in SQL to deal with multiple files. Done and done. 

If the files all have the same structure, you can rest all into one file in one data step. 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

If you get all the data into one dataset, then there is no "looping problem", why would you need to "loop"?  As I posted, you just add the year into the group by and you get a dataset out by year?  It is basic SAS by group processing - a fundamental processing taks in any Base SAS project which should be your number 1 tool to do any kind of data processing?

ken2
Obsidian | Level 7

To get all those data from separate Unit Year files in the disk, into one single data source file, that is a loop.

data source_file;

set file1 file2 .......;

run;

 

if there are many files, this step itself is quite tidious.

Reeza
Super User

If you assume you need loops....or lets assume you don't. 

Your question should be, how do you read in all files at once. Which is also an FAQ. 

 

http://www.ats.ucla.edu/stat/sas/faq/multi_file_read.htm

 

https://communities.sas.com/t5/SAS-Communities-Library/How-do-I-write-a-macro-to-import-multiple-tex...

 

As well, for a Set statement you can use lists. The following would append all tables from file1 to file200, amd create a variable file_input that would have the name of the source file for each record.  

 

Data combined;

length source file_input $200;

set file1-file200 indsname=source;

 

file_input=source;

 

run; 

 

 

ken2
Obsidian | Level 7

Thanks. I will try it out.

I made up this example, just to have a generic method to access multiple files, execute a common purpose macro, and then output/append the result into a destination file.

Reeza
Super User

If your intent is learning versus solving a business problem, I would recommend reading David Cassells paper, Don't be Loopy. It shows what not to do and why not to loop. 

 

http://www2.sas.com/proceedings/forum2007/183-2007.pdf

ken2
Obsidian | Level 7

Thanks for the paper, I will read through it!

The reason for being so is, in a real production environment, those files can be huge. Also, due to legal requirement, they can not be deleted. Suppose that we lump them all into a single file to avoid the loop, this will double the disk space, and also we will have to append to that single file each month, quarter or year. When the files are so small as in my sample, it is not an issue. Once it is a huge one, then it becomes quite difficult to deal with.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Then you would be better off looking into proper data warehousing/database software to deal with big data.  It really shouldn't matter to a program whether the data is one big file or lots of small files, it still takes the same processing, there are some gains and losses on both sides of that equation.

If you do have to go down this route, then simplfy the code to:

data _null_;
  set sashelp.vtable (where=(libname="<your libname>" and substr(memname,1,7)="Support"));
  call execute('proc sql;
                  create table WANT as
                  select * 
                  from    <your libname>.'||strip(memname)||';
                quit;');
run;

The where brings in all Support... files in <your libname>, and then for each of these generates out a proc sql step.  

Astounding
PROC Star

Perhaps best of all, you can accomplish this with small changes to the existing macro.

 

First, make sure YEAR and UNIT are part of each summary data set.

 

Second, add to the end of the macro:

 

proc append data=&Unit._&Year._Summary base=Total_Summary;

run;

 

Done!  In fact, this approach lets you simplify.  You can re-use the same name for all summary data sets.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 16 replies
  • 9796 views
  • 0 likes
  • 5 in conversation