DATA Step, Macro, Functions and more

Import multiple Excel files(xlsx) directly from zip file

Reply
Contributor
Posts: 41

Import multiple Excel files(xlsx) directly from zip file

I have lots of xlsx files in a zip file.  I would like to read it directly.  SAS cannot find xlsx file names.  Should the excel file names be same as the zip file name? Is it possible to read xlsx files in zip file directly?

filename zipfile zip "e:\Users\mhollifi\Documents\Dillon Study GP\18_3_9\submission_cleaned.zip";
data test;
  infile zipfile(ajellowilliam_29616_1296089_MBTN_Peer_evalsGroup6.xlsx) recfm=f lrecl=1;
  file 'e:\Users\mhollifi\Documents\Dillon Study GP\18_3_9\ajellowilliam_29616_1296089_MBTN_Peer_evalsGroup6' recfm=f lrecl=1;
  input ch $char1.;
  put ch $char1.;
run;

ERROR: Entry ajellowilliam_29616_1296089_MBTN_Peer_evalsGroup6.xlsx in zip file
       e:\Users\mhollifi\Documents\Dillon Study GP\18_3_9\submission_cleaned.zip does not exist.
ERROR: Physical file does not exist, ajellowilliam_29616_1296089_MBTN_Peer_evalsGroup6.xlsx.
WARNING: The data set WORK.TEST may be incomplete.  When this step was stopped there were 0
         observations and 1 variables.
WARNING: Data set WORK.TEST was not replaced because this step was stopped.
Super User
Posts: 22,850

Re: Import multiple Excel files(xlsx) directly from zip file

I don't think it's possible to read a zipped xlsx file. 

 

However, you usually don't get massive gains by zipping xlsx files because they're already compressed. If you change the extension of an xlsx file to zip and uncompress it you can see the xml components.

Super User
Super User
Posts: 7,855

Re: Import multiple Excel files(xlsx) directly from zip file

You cannot read from an XLSX file as if it was a simple text file.

Are you trying to get the filenames?

Try something like this.

filename zipfile zip "e:\Users\mhollifi\Documents\Dillon Study GP\18_3_9\submission_cleaned.zip";
data files ;
 length fileno 8 filename $256 ;
 did=dopen('zipfile');
 do fileno=1 to dnum(did);
   filename=dread(did,fileno);
   output;
 end;
 did=dclose(did);
 drop did;
run;
Contributor
Posts: 41

Re: Import multiple Excel files(xlsx) directly from zip file

Thank you for your reply.  I tried to use your code, and I got this error message. Why?

filename zipfile zip "e:\Users\mhollifi\Documents\Dillon Study GP\18_3_9\submission_cleaned.zip";
data _null_ ;
 length fileno 8 filename $256 ;
 did=dopen('zipfile');
 %do fileno=1 %to dnum(did);

ERROR: The %DO statement is not valid in open code.


   filename=dread(did,fileno);
   output;
    %readdata(filename)

MPRINT(READDATA):   filename zipfile clear;
MPRINT(READDATA):   proc import datafile="filename" out=want DBMS=EXCEL REPLACE;
MPRINT(READDATA):   AEXC;
MPRINT(READDATA):   RANGE="Sheet1$";
MPRINT(READDATA):   GETNAMES=YES;
MPRINT(READDATA):   MIXED=no;
MPRINT(READDATA):   SCANTEXT=YES;
MPRINT(READDATA):   USEDATE=YES;
MPRINT(READDATA):   SCANTIME=YES;
MPRINT(READDATA):   RUN;
ERROR: Unable to open file C:\Users\00004263\filename.XLS. It does not  exist or it is already
       opened exclusively by another user, or you need permission to view its data.
ERROR: The %END statement is not valid in open code.


 %end;
 did=dclose(did);
 drop did;

run;
%macro readdata(xlsfile);
filename zipfile clear;

proc import datafile="&xlsfile" out=want
     DBMS=EXCEL REPLACE;
     RANGE="Sheet1$"; 
     GETNAMES=YES;
     MIXED=no;
     SCANTEXT=YES;
     USEDATE=YES;
     SCANTIME=YES;
RUN;

%mend;
Super User
Super User
Posts: 7,855

Re: Import multiple Excel files(xlsx) directly from zip file

Why would you use a %DO loop in the middle of a data step?
Super User
Super User
Posts: 7,855

Re: Import multiple Excel files(xlsx) directly from zip file

You will need to copy the files out of the ZIP file before you can use them with XLSX libname engine (or PROC IMPORT).

Let's make a macro to copy one file and import all of the sheets into a SAS libref.

%macro import_xlsx(zipfile=,member=,outlib= );
%local filen dlm ;
%if "&sysscp" = "WIN" %then %let dlm=\ ;
%else %let dlm=/ ;
%let filen=%sysfunc(quote(%qsysfunc(pathname(work))&dlm.copy.xlsx));
filename file1 zip &zipfile member=&member recfm=n;
filename file2 &filen recfm=n;
data _null_;
  infile file1 recfm=n ;
  file file2 recfm=n ;
  input;
  put _infile_;
run;
libname xlsx xlsx &filen ;

proc copy inlib=xlsx outlib=&outlib;
run;

libname xlsx clear ;
data _null_;
  fileref='file2';
  rc=fdelete(fileref);
run;
filename file2 ;
filename file1 ;

%mend import_xlsx;

So now if you take the list of files you generated using the earlier data step you can use it to generate a series of calls to this macro.

filename code ;
data _null_;
  set files ;
  file code ;
  put '%import_xlsx(zipfile="zipfilename_goes_here"'
      ',member=' filename :$quote.
      ',outlib=work)'
  ;
run;
%include code / source2 ;
Contributor
Posts: 41

Re: Import multiple Excel files(xlsx) directly from zip file

Thank you for your reply.

But hum...

Not quite sure what's going on to me.

I got this error message and I cannot figure out where "filename" variable comes from.

%local filen dlm ;
%if "&sysscp" = "WIN" %then %let dlm=\ ;
%else %let dlm=/ ;
%let filen=%sysfunc(quote(%qsysfunc(pathname(work))&dlm.copy.xlsx));
filename file1 zip &zipfile member=&member recfm=n;
filename file2 &filen recfm=n;
data _null_;
  infile file1 recfm=n ;
  file file2 recfm=n ;
  input;
  put _infile_;
run;
libname xlsx xlsx &filen ;

proc copy inlib=xlsx outlib=&outlib;
run;

libname xlsx clear ;
data _null_;
  fileref='file2';
  rc=fdelete(fileref);
run;
filename file2 ;
filename file1 ;

%mend import_xlsx;
*So now if you take the list of files you generated using the earlier data step you can use it to generate a series of calls to this macro.
filename code ;
data _null_;
  set files ;
  file code ;
  put '%import_xlsx(zipfile="e:\Users\mhollifi\Documents\Dillon Study GP\18_3_9\submission_cleaned.zip"'
      ',member=' filename :$quote.
      ',outlib=work)'
  ;
run;
%include code / source2 ;

and error message is

WARNING: Physical file does not exist, C:\Users\00004263\code.sas.
ERROR: Cannot open %INCLUDE file CODE.

in the end of line.

Super User
Super User
Posts: 7,855

Re: Import multiple Excel files(xlsx) directly from zip file

[ Edited ]

If you didn't create a dataset with the list of the files in the ZIP file then your data step to generate one macro call per filename will not generate ANY lines into the CODE file.  So of course there are no lines to %INCLUDE from the file.


Make sure to FIRST run the code I posted before for reading the names of the files that are in your ZIP file. And make sure not to mistakenly convert the data step DO loops into macro code %DO loops.

 

Also make sure that the name of the variable you created to hold the filename matches the name of the variable you are using to write the macro call into the text file CODE.

Super Contributor
Posts: 469

Re: Import multiple Excel files(xlsx) directly from zip file

Hi,

 

Check if this is helpful

 

Using FILENAME ZIP to unzip and read data files in SAS

Thanks,
Suryakiran
Ask a Question
Discussion stats
  • 8 replies
  • 120 views
  • 0 likes
  • 4 in conversation