BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
chris2377
Quartz | Level 8

Hi,

 

I have a number of Excel files, named Company1_2016Q1.xlsx, Company1_2016Q2.xlsx, Company2_2016Q1.xlsx, Company2_2016Q2.xlsx etc. They are stored in one folder with multiple subfolders.

 

 

I have a following macro to import all Excel files from a catalog to SAS (I've found it somwhere on the Web, I didn't write it on my own). Could you help me modify the macro in two ways:

1. I would like to loop not only through the folder but also through subsoflders and sub-subfolders (if they exist)
2. I have some problem with manipulating the imported datasest - please take a look at the comments in the code below.

 

%macro read_files (katalog); 
%let rc=%sysfunc(filename(rawdata,&katalog)); 
%let did=%sysfunc(dopen(&rawdata)); 

%let dnum=%sysfunc(dnum(&did)); 

%do i=1 %to &dnum; 
	%let name_file=%sysfunc(dread(&did,&i)); 
	
	%let rozszerz=%sysfunc(scan(&name_file,2,.)); 

	%if &rozszerz=xlsx %then %do;
		%let name_file_short=%sysfunc(scan(&name_file,1,.)); 
		
			PROC IMPORT OUT=WORK.&name_file_short 
			DATAFILE="&katalog\&name_file" DBMS=xlsx REPLACE;
			RUN;
			/* HERE IS THE PROBLEM:
			- I want to create two variables in the dataset on the basis of the dataset name but the substr doesn't work*/
			data WORK.&name_file_short;
				set WORK.&name_file_short;
				comp=%substr(&name_file_short,1,8);
				date=%substr(&name_file_short,10,6);
			run;
	%end;

%end;
%mend;
%read_files(C:\Data);
1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Doh!  Yes, of course you need to remove file extensions.  Still, I would highly advise that using dates and such like in the dataset names, imagine how much simpler the code would be if you appended all the data into one dataset and used that, or just had all the dataset names similar rather than trying to work out what each one is called.  Again, its the Excel thinking, dates in tab or file names.  So this would be my preferred way (assuming I couldn't append all the data into one file either in SAS or Excel - note with VBA you could load all the files and dump to CSV, then read the one file in).

filename tmp pipe 'dir "s:\temp\rob" /s /b';

data test;
  infile tmp dlm="¬";  
  length buff $2000;
  input buff $;
  fname=scan(buff,countw(buff,"\"),"\");
  if index(upcase(fname),".XLSX")>0 then 
    call execute(cats('proc import datafile="',buff,'" out=DS',put(_n_,best.),'; run;'));

  /* Extra bit to add date into dataset */
  fname=strip(tranwrd(substr(fname,1,11),"_",""));
  call execute(cats('data ds',put(_n_,best.),'; date="',fname,'"; run;'));
run;

But yes, you could just scan(fname,1,"."). 

View solution in original post

10 REPLIES 10
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Assuming you have x window access this is pretty simple (and assuming there is no duplicate names in the subfolders!):

filename tmp pipe 'dir "s:\temp\rob" /s /b';

data test;
  infile tmp dlm="¬";  
  length buff $2000;
  input buff $;
  fname=scan(buff,countw(buff,"\"),"\");
  if index(upcase(fname),".XLSX")>0 then 
    call execute(cats('proc import datafile="',buff,'" out=',fname,'; run;'));
run;

Note however that Excel is a really poor data medium, and proc import is the worst method of importing the data, so what you will likely end up with is rubbish that you need to post process endlessly.  Far better to get the data in a better format (XML, CSV etc.).

chris2377
Quartz | Level 8

Thanks. I know that Excel files are  poor data medium, but I cannot change it unfornately.

 

Another problem is that names of the Excel files start with number and the code doesn't work (as it tries to create a sas datasest with a name starting from a digit). Could you help me modifying this code? I've tried to manipulate the "out=',fname,'" part but with little success.

 

Once I have all the data in SAS files, I need to modify each by adding some variable, depending on the dataset name (see my comments in the code in the previous post). Could you suggest something there?

 

Best

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Minor update then:

filename tmp pipe 'dir "s:\temp\rob" /s /b';

data test;
  infile tmp dlm="¬";  
  length buff $2000;
  input buff $;
  fname=scan(buff,countw(buff,"\"),"\");
/* Update fname to be valid SAS dataset name here per what you know the 
   files to be called I will just add a character per your information */
  fname=cats("A",fname);
  if index(upcase(fname),".XLSX")>0 then 
    call execute(cats('proc import datafile="',buff,'" out=',fname,'; run;'));
run;
chris2377
Quartz | Level 8

Thanks. It should work now, but there is another problem with my data. I get the following error "ERROR: Libref ''_2017_03_123_ALL' exceeds 8 characters." Unfortunately I cannot simply take first 8 characters of the name - I need everything, as the name of the file contains information that I want to use later on. So maybe the original "proc import" macro could be somehow modified to loop through subfolders as well? It works OK for all files in one folder (at least the part that imports data).

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Please post your code and what the filenames look like - exactly.  Nowhere in my code do I set libnames.  It aalso wouldn't matter how you arrive at your end code, the dataset names have to conform to SAS dataset name conventions - a very good example of why not to put "data" in headings or filenames.  If you need the date in the dataset, then do that in another step, maybe something like:

filename tmp pipe 'dir "s:\temp\rob" /s /b';

data test;
  infile tmp dlm="¬";  
  length buff $2000;
  input buff $;
  fname=scan(buff,countw(buff,"\"),"\");
  if index(upcase(fname),".XLSX")>0 then 
    call execute(cats('proc import datafile="',buff,'" out=DS',put(_n_,best.),'; run;'));

  /* Extra bit to add date into dataset */
  fname=strip(tranwrd(substr(fname,1,11),"_",""));
  call execute(cats('data ds',put(_n_,best.),'; date="',fname,'"; run;'));
run;

Note in the above I am just using _n_ to create DSx where x refers to the _n_ number.  Am just guessing at what you have/need though.  This will add a column in with the date part of the filename.

chris2377
Quartz | Level 8

The code is as follows:

 

filename tmp pipe 'dir "C:\........" /s /b';

data test;
  infile tmp dlm="¬";  
  length buff $2000;
  input buff $;
  fname=scan(buff,countw(buff,"\"),"\");
/* Update fname to be valid SAS dataset name here per what you know the 
   files to be called I will just add a character per your information */
  fname=cats("_",fname);
  if index(upcase(fname),".XLSX")>0 then 
    call execute(cats('proc import datafile="',buff,'" out=',fname,'; run;'));
run;

And the error message is as follows  (it gives the same error for each file, I only paste one):

 


NOTE: The infile TMP is:
      Unnamed Pipe Access Device,
      PROCESS=dir "C:\..........." /s /b,
      RECFM=V,LRECL=32767

NOTE: 16 records were read from the infile TMP.
      The minimum record length was 61.
      The maximum record length was 93.
NOTE: The data set WORK.TEST has 16 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           2.21 seconds
      cpu time            0.15 seconds


NOTE: CALL EXECUTE generated line.
NOTE: PROCEDURE IMPORT used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

ERROR: Libref '_2017_03_123_GR1' exceeds 8 characters.
NOTE: The SAS System stopped processing this step because of errors.

The Excel names are as follows: 2017_03_123_GR1.xlsx; 2017_03_124_GR1.xlsx; 2017_03_123_GR2.xlsx; 2017_03_124_GR3.xlsx etc. 

ballardw
Super User

I believe what you are doing when you attempt to read 2017_03_123_GR1.xlsx is that is the value of FNAME.

So when this is encountered:

call execute(cats('proc import datafile="',buff,'" out=',fname,'; run;'));

the output SAS DATASET name is 2017_03_123_GR1.xlsx which SAS treats the 2017_03_123_GR1 portion as the library name and the XLSX as the dataset name in LIB.SETNAME notation.

So truncate the Fname at the . before using it in the Call Execute.

Perhaps Fname=  Scan(fname,1,'.'); just before.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Doh!  Yes, of course you need to remove file extensions.  Still, I would highly advise that using dates and such like in the dataset names, imagine how much simpler the code would be if you appended all the data into one dataset and used that, or just had all the dataset names similar rather than trying to work out what each one is called.  Again, its the Excel thinking, dates in tab or file names.  So this would be my preferred way (assuming I couldn't append all the data into one file either in SAS or Excel - note with VBA you could load all the files and dump to CSV, then read the one file in).

filename tmp pipe 'dir "s:\temp\rob" /s /b';

data test;
  infile tmp dlm="¬";  
  length buff $2000;
  input buff $;
  fname=scan(buff,countw(buff,"\"),"\");
  if index(upcase(fname),".XLSX")>0 then 
    call execute(cats('proc import datafile="',buff,'" out=DS',put(_n_,best.),'; run;'));

  /* Extra bit to add date into dataset */
  fname=strip(tranwrd(substr(fname,1,11),"_",""));
  call execute(cats('data ds',put(_n_,best.),'; date="',fname,'"; run;'));
run;

But yes, you could just scan(fname,1,"."). 

chris2377
Quartz | Level 8
Thanks. I'm more than aware that having some information included in the name of the file only is far from optimal. But I have no infulence on this at all. I can only look for a way to combine them in SAS..

Btw - in your first response, you mentioned that "proc import is the worst method of importing the data". Why?
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Ah, thats an easy one.  Proc import is what is known as a guessing procedure.  Its scans the datafile with some parameters and guessse what the best format for the data is, so you don't necessarily always get the same result, or know what you will get.  I always recommend that the user specifies what they want from the data, to do this by creating a datastep import and fixing how SAS reads the data, assign correct lengths, informats etc.  It takes a bit longer, and you can use the output from a proc import as the basis, but the key is to make sure you - the person who knows the data best or can interpret it best - is the one who makes the call on the import.  Excel for instance has lots of lovely functionality (mostly hidden) which can really mess up data, each cell can be different for instance.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 8223 views
  • 0 likes
  • 3 in conversation