DATA Step, Macro, Functions and more

Macro to import all Excel files

Accepted Solution Solved
Reply
Contributor
Posts: 50
Accepted Solution

Macro to import all Excel files

Hi,

 

I have a number of Excel files, named Company1_2016Q1.xlsx, Company1_2016Q2.xlsx, Company2_2016Q1.xlsx, Company2_2016Q2.xlsx etc. They are stored in one folder with multiple subfolders.

 

 

I have a following macro to import all Excel files from a catalog to SAS (I've found it somwhere on the Web, I didn't write it on my own). Could you help me modify the macro in two ways:

1. I would like to loop not only through the folder but also through subsoflders and sub-subfolders (if they exist)
2. I have some problem with manipulating the imported datasest - please take a look at the comments in the code below.

 

%macro read_files (katalog); 
%let rc=%sysfunc(filename(rawdata,&katalog)); 
%let did=%sysfunc(dopen(&rawdata)); 

%let dnum=%sysfunc(dnum(&did)); 

%do i=1 %to &dnum; 
	%let name_file=%sysfunc(dread(&did,&i)); 
	
	%let rozszerz=%sysfunc(scan(&name_file,2,.)); 

	%if &rozszerz=xlsx %then %do;
		%let name_file_short=%sysfunc(scan(&name_file,1,.)); 
		
			PROC IMPORT OUT=WORK.&name_file_short 
			DATAFILE="&katalog\&name_file" DBMS=xlsx REPLACE;
			RUN;
			/* HERE IS THE PROBLEM:
			- I want to create two variables in the dataset on the basis of the dataset name but the substr doesn't work*/
			data WORK.&name_file_short;
				set WORK.&name_file_short;
				comp=%substr(&name_file_short,1,8);
				date=%substr(&name_file_short,10,6);
			run;
	%end;

%end;
%mend;
%read_files(C:\Data);

Accepted Solutions
Solution
‎07-11-2017 05:58 AM
Super User
Super User
Posts: 7,392

Re: Macro to import all Excel files

Doh!  Yes, of course you need to remove file extensions.  Still, I would highly advise that using dates and such like in the dataset names, imagine how much simpler the code would be if you appended all the data into one dataset and used that, or just had all the dataset names similar rather than trying to work out what each one is called.  Again, its the Excel thinking, dates in tab or file names.  So this would be my preferred way (assuming I couldn't append all the data into one file either in SAS or Excel - note with VBA you could load all the files and dump to CSV, then read the one file in).

filename tmp pipe 'dir "s:\temp\rob" /s /b';

data test;
  infile tmp dlm="¬";  
  length buff $2000;
  input buff $;
  fname=scan(buff,countw(buff,"\"),"\");
  if index(upcase(fname),".XLSX")>0 then 
    call execute(cats('proc import datafile="',buff,'" out=DS',put(_n_,best.),'; run;'));

  /* Extra bit to add date into dataset */
  fname=strip(tranwrd(substr(fname,1,11),"_",""));
  call execute(cats('data ds',put(_n_,best.),'; date="',fname,'"; run;'));
run;

But yes, you could just scan(fname,1,"."). 

View solution in original post


All Replies
Super User
Super User
Posts: 7,392

Re: Macro to import all Excel files

Assuming you have x window access this is pretty simple (and assuming there is no duplicate names in the subfolders!):

filename tmp pipe 'dir "s:\temp\rob" /s /b';

data test;
  infile tmp dlm="¬";  
  length buff $2000;
  input buff $;
  fname=scan(buff,countw(buff,"\"),"\");
  if index(upcase(fname),".XLSX")>0 then 
    call execute(cats('proc import datafile="',buff,'" out=',fname,'; run;'));
run;

Note however that Excel is a really poor data medium, and proc import is the worst method of importing the data, so what you will likely end up with is rubbish that you need to post process endlessly.  Far better to get the data in a better format (XML, CSV etc.).

Contributor
Posts: 50

Re: Macro to import all Excel files

Thanks. I know that Excel files are  poor data medium, but I cannot change it unfornately.

 

Another problem is that names of the Excel files start with number and the code doesn't work (as it tries to create a sas datasest with a name starting from a digit). Could you help me modifying this code? I've tried to manipulate the "out=',fname,'" part but with little success.

 

Once I have all the data in SAS files, I need to modify each by adding some variable, depending on the dataset name (see my comments in the code in the previous post). Could you suggest something there?

 

Best

Super User
Super User
Posts: 7,392

Re: Macro to import all Excel files

Minor update then:

filename tmp pipe 'dir "s:\temp\rob" /s /b';

data test;
  infile tmp dlm="¬";  
  length buff $2000;
  input buff $;
  fname=scan(buff,countw(buff,"\"),"\");
/* Update fname to be valid SAS dataset name here per what you know the 
   files to be called I will just add a character per your information */
  fname=cats("A",fname);
  if index(upcase(fname),".XLSX")>0 then 
    call execute(cats('proc import datafile="',buff,'" out=',fname,'; run;'));
run;
Contributor
Posts: 50

Re: Macro to import all Excel files

Thanks. It should work now, but there is another problem with my data. I get the following error "ERROR: Libref ''_2017_03_123_ALL' exceeds 8 characters." Unfortunately I cannot simply take first 8 characters of the name - I need everything, as the name of the file contains information that I want to use later on. So maybe the original "proc import" macro could be somehow modified to loop through subfolders as well? It works OK for all files in one folder (at least the part that imports data).

Super User
Super User
Posts: 7,392

Re: Macro to import all Excel files

Please post your code and what the filenames look like - exactly.  Nowhere in my code do I set libnames.  It aalso wouldn't matter how you arrive at your end code, the dataset names have to conform to SAS dataset name conventions - a very good example of why not to put "data" in headings or filenames.  If you need the date in the dataset, then do that in another step, maybe something like:

filename tmp pipe 'dir "s:\temp\rob" /s /b';

data test;
  infile tmp dlm="¬";  
  length buff $2000;
  input buff $;
  fname=scan(buff,countw(buff,"\"),"\");
  if index(upcase(fname),".XLSX")>0 then 
    call execute(cats('proc import datafile="',buff,'" out=DS',put(_n_,best.),'; run;'));

  /* Extra bit to add date into dataset */
  fname=strip(tranwrd(substr(fname,1,11),"_",""));
  call execute(cats('data ds',put(_n_,best.),'; date="',fname,'"; run;'));
run;

Note in the above I am just using _n_ to create DSx where x refers to the _n_ number.  Am just guessing at what you have/need though.  This will add a column in with the date part of the filename.

Contributor
Posts: 50

Re: Macro to import all Excel files

The code is as follows:

 

filename tmp pipe 'dir "C:\........" /s /b';

data test;
  infile tmp dlm="¬";  
  length buff $2000;
  input buff $;
  fname=scan(buff,countw(buff,"\"),"\");
/* Update fname to be valid SAS dataset name here per what you know the 
   files to be called I will just add a character per your information */
  fname=cats("_",fname);
  if index(upcase(fname),".XLSX")>0 then 
    call execute(cats('proc import datafile="',buff,'" out=',fname,'; run;'));
run;

And the error message is as follows  (it gives the same error for each file, I only paste one):

 


NOTE: The infile TMP is:
      Unnamed Pipe Access Device,
      PROCESS=dir "C:\..........." /s /b,
      RECFM=V,LRECL=32767

NOTE: 16 records were read from the infile TMP.
      The minimum record length was 61.
      The maximum record length was 93.
NOTE: The data set WORK.TEST has 16 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           2.21 seconds
      cpu time            0.15 seconds


NOTE: CALL EXECUTE generated line.
NOTE: PROCEDURE IMPORT used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

ERROR: Libref '_2017_03_123_GR1' exceeds 8 characters.
NOTE: The SAS System stopped processing this step because of errors.

The Excel names are as follows: 2017_03_123_GR1.xlsx; 2017_03_124_GR1.xlsx; 2017_03_123_GR2.xlsx; 2017_03_124_GR3.xlsx etc. 

Super User
Posts: 10,466

Re: Macro to import all Excel files

I believe what you are doing when you attempt to read 2017_03_123_GR1.xlsx is that is the value of FNAME.

So when this is encountered:

call execute(cats('proc import datafile="',buff,'" out=',fname,'; run;'));

the output SAS DATASET name is 2017_03_123_GR1.xlsx which SAS treats the 2017_03_123_GR1 portion as the library name and the XLSX as the dataset name in LIB.SETNAME notation.

So truncate the Fname at the . before using it in the Call Execute.

Perhaps Fname=  Scan(fname,1,'.'); just before.

Solution
‎07-11-2017 05:58 AM
Super User
Super User
Posts: 7,392

Re: Macro to import all Excel files

Doh!  Yes, of course you need to remove file extensions.  Still, I would highly advise that using dates and such like in the dataset names, imagine how much simpler the code would be if you appended all the data into one dataset and used that, or just had all the dataset names similar rather than trying to work out what each one is called.  Again, its the Excel thinking, dates in tab or file names.  So this would be my preferred way (assuming I couldn't append all the data into one file either in SAS or Excel - note with VBA you could load all the files and dump to CSV, then read the one file in).

filename tmp pipe 'dir "s:\temp\rob" /s /b';

data test;
  infile tmp dlm="¬";  
  length buff $2000;
  input buff $;
  fname=scan(buff,countw(buff,"\"),"\");
  if index(upcase(fname),".XLSX")>0 then 
    call execute(cats('proc import datafile="',buff,'" out=DS',put(_n_,best.),'; run;'));

  /* Extra bit to add date into dataset */
  fname=strip(tranwrd(substr(fname,1,11),"_",""));
  call execute(cats('data ds',put(_n_,best.),'; date="',fname,'"; run;'));
run;

But yes, you could just scan(fname,1,"."). 

Contributor
Posts: 50

Re: Macro to import all Excel files

Thanks. I'm more than aware that having some information included in the name of the file only is far from optimal. But I have no infulence on this at all. I can only look for a way to combine them in SAS..

Btw - in your first response, you mentioned that "proc import is the worst method of importing the data". Why?
Super User
Super User
Posts: 7,392

Re: Macro to import all Excel files

Ah, thats an easy one.  Proc import is what is known as a guessing procedure.  Its scans the datafile with some parameters and guessse what the best format for the data is, so you don't necessarily always get the same result, or know what you will get.  I always recommend that the user specifies what they want from the data, to do this by creating a datastep import and fixing how SAS reads the data, assign correct lengths, informats etc.  It takes a bit longer, and you can use the output from a proc import as the basis, but the key is to make sure you - the person who knows the data best or can interpret it best - is the one who makes the call on the import.  Excel for instance has lots of lovely functionality (mostly hidden) which can really mess up data, each cell can be different for instance.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 171 views
  • 0 likes
  • 3 in conversation