BookmarkSubscribeRSS Feed
melhaf
Fluorite | Level 6

I'm looking at this code 

 

%*Creates a list of all files in the DIR directory with the specified extension (EXT);
%macro list_files(dir,ext);
	%local filrf rc did memcnt name i;
	%let rc=%sysfunc(filename(filrf,&dir));
	%let did=%sysfunc(dopen(&filrf));

	%if &did eq 0 %then
		%do;
			%put Directory &dir cannot be open or does not exist;

			%return;
		%end;

	%do i = 1 %to %sysfunc(dnum(&did));
		%let name=%qsysfunc(dread(&did,&i));

		%if %qupcase(%qscan(&name,-1,.)) = %upcase(&ext) %then
			%do;
				%put &dir\&name;
				%let file_name =  %qscan(&name,1,.);
				%put &file_name;

				data _tmp;
					length dir $512 name $100;
					dir=symget("dir");
					name=symget("name");
					path = catx('\',dir,name);
					the_name = substr(name,1,find(name,'.')-1);
				run;

				proc append base=list data=_tmp force;
				run;

				quit;

				proc sql;
					drop table _tmp;
				quit;

			%end;
		%else %if %qscan(&name,2,.) = %then
			%do;
				%list_files(&dir\&name,&ext)
			%end;
	%end;

	%let rc=%sysfunc(dclose(&did));
	%let rc=%sysfunc(filename(filrf));
%mend list_files;

%*Macro to import a single file, using the path, filename and an output dataset name must be specified;
%macro import_file(path, file_name, dataset_name );

	proc import 
		datafile="&path.\&file_name."
		dbms=xlsx
		out=&dataset_name replace;
	run;

%mend;

*Create the list of files, in this case all XLSX files;
%list_files(c:\_localData\temp, xlsx);

%*Call macro once for each entry in the list table created from the %list_files() macro;
data _null_;
	set list;
	string = catt('%import_file(', dir, ', ',  name,', ', catt('test', put(_n_, z2.)), ');');
	call execute (string);
run;

But I have 8 xlsx files in a common folder: C:\Deskop\Budget\Budget, q1q2\'and the 8 excelfiles' (that is for the first 8 months):

 

Budget_jan.xlsx,

Budget_feb.xlsx,

Budget_march.xlsx,

Budget_april.xlsx,

Budget_may.xlsx,

Budget_june.xlsx,

.......

Budget_dec.xlsx.


after the first macro 

%macro list_files(dir,ext);

%put &=path.;
%put &=filename.;
%put &=datasetname.;

and path is correct, but the other two, filname and datasetname, is not resolved.  Where do I print it out in that code? thought that SAS creating it by itself?

 

And then, when I get to the second macro:

%macro import_file(path, file_name, dataset_name );
	proc import 
		datafile="BUDGETPATH\Budget2023\*.xlsx /*I don't know how to tell SAS to get all of the 12 excelfiles with the *-wildcard */
		dbms=xlsx
		out=&dataset_name replace;
	run;
%mend;

Beacuse in this line: 

%list_files(MyPath\the Budget-folder, xlsx); 

it creates that list of all the xlsx.files. (the 12 of them)

and in the last code: 

data _null_;
	set alla_styrfiler;
	string = catt('%import_file(', dir, ', ',  name,', ', catt('ALL_BUDGETS_XSLX_FILES', put(_n_, z2.)), ');');
	call execute (string);
run;

it just copy paste the file in the proc import. I tried to specifiy for example januray in that proc import statement, since the wildcard dosen't work, but it just creates 12 duplicates.

 

Merry x-max!

3 REPLIES 3
Tom
Super User Tom
Super User

Does the first macro work?

What dataset does it generate?
What is in the dataset it generates?

 

Why did you use macro code to make the list of files it you wanted the list of files in a dataset?  Why not just use a data step to call the SAS functions DREAD() etc directly instead of going to all of that extra work to use %SYSFUNC() to call SAS functions in macro code?

 

If you want a macro to find all files in a sub-tree then use %dirtree() It does not attempt to use macro code to call SAS functions.

ballardw
Super User

Are you aware that if you import multiple Excel files that are supposedly of the same structure that it is very likely that you will have some variables with different attributes based on the content of each file? Lengths of character variables are very likely to differ and it is not uncommon based on contents of files that some variables will change type between files. 

Quentin
Super User

Using the macro language this way is a complex topic, but it looks to me like the macros you posted all work.  You want the first version of %import_file, not the one you changed to have an asterisk in it.  

 

The job of %list_files() is to make an output dataset, work.list, which has the name and path of each Excel file found.  If I run:

 

%list_files(Q:\junk\Excel, xlsx)

proc print data=list ;
run ;

It works, and the PROC PRINT of work.list shows:

Obs    dir                        name              path                                      the_name

 1     Q:\junk\Excel\MoreExcel    myotherxl.xlsx    Q:\junk\Excel\MoreExcel\myotherxl.xlsx    myotherxl
 2     Q:\junk\Excel              myxl.xlsx         Q:\junk\Excel\myxl.xlsx                   myxl

So it succeeded in finding two Excel files, and you can see the directory and name of each.

 

The job of %import_file is to import one excel file and write one SAS dataset.  So if I want to convert each of these excel files into SAS datasets I could code it as:

%import_file(Q:\junk\Excel,myxl.xlsx,want1)
%import_file(Q:\junk\Excel\MoreExcel,myotherxl.xlsx,want2)

And that would work.  It would create work.want1 and work.want2.  But the challenge with that approach is I need to write one  macro call per Excel file, and I need to know the name and path for each file.

 

The job of the data _null_ step with the CALL EXECUTE is to read the data in WORK.LIST (which has one row per Excel file), and then generate one call to %import_file for each Excel file.  So instead of hard-coding two calls to %import_file(), I can code:

 

data _null_;
	set list;
	string = catt('%import_file(', dir, ', ',  name,', ', catt('test', put(_n_, z2.)), ');');
	call execute (string);
run;

And CALL EXECUTE will generate two calls to %import_file, using the values of the variables in work.list to specify the directory, and the file name.  The log from running that step is:

 

98   data _null_;
99     set list;
100    string = catt('%import_file(', dir, ', ',  name,', ', catt('test', put(_n_, z2.)), ');');
101    call execute (string);
102  run;

MPRINT(IMPORT_FILE):   proc import datafile="Q:\junk\Excel\MoreExcel\myotherxl.xlsx" dbms=xlsx out=test01 replace;
MPRINT(IMPORT_FILE):   run;
MPRINT(IMPORT_FILE):   proc import datafile="Q:\junk\Excel\myxl.xlsx" dbms=xlsx out=test02 replace;
MPRINT(IMPORT_FILE):   run;
NOTE: There were 2 observations read from the data set WORK.LIST.

NOTE: CALL EXECUTE generated line.
1   + proc import     datafile="Q:\junk\Excel\MoreExcel\myotherxl.xlsx"     dbms=xlsx     out=test01 replace;
1   +                                                                                                           run;

NOTE: One or more variables were converted because the data type is not supported by the V9 engine. For more details, run with
      options MSGLEVEL=I.
NOTE: The import data set has 19 observations and 6 variables.
NOTE: WORK.TEST01 data set was successfully created.

1   +                                                                                                               ;
2   + proc import     datafile="Q:\junk\Excel\myxl.xlsx"     dbms=xlsx     out=test02 replace;
2   +                                                                                            run;

NOTE: One or more variables were converted because the data type is not supported by the V9 engine. For more details, run with
      options MSGLEVEL=I.
NOTE: The import data set has 19 observations and 6 variables.
NOTE: WORK.TEST02 data set was successfully created.

2   +

The benefit of this dynamic programming approach is that you don't need to know the names of the Excel files, or how many there are.  In your case, if you have twelve xlsx files, you should be able to run:

options mprint ;
proc delete data=list ;
run ;
%list_files(C:\Deskop\Budget, xlsx)

data _null_;
	set list;
	string = catt('%import_file(', dir, ', ',  name,', ', catt('test', put(_n_, z2.)), ');');
	call execute (string);
run;

And it should output twelve datasets, work.test01-work.test12.  

 

If that is not working, please run that code and then post the full log.

BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 526 views
  • 0 likes
  • 4 in conversation