BookmarkSubscribeRSS Feed
mhollifi
Obsidian | Level 7

I have lots of xlsx files in a zip file.  I would like to read it directly.  SAS cannot find xlsx file names.  Should the excel file names be same as the zip file name? Is it possible to read xlsx files in zip file directly?

filename zipfile zip "e:\Users\mhollifi\Documents\Dillon Study GP\18_3_9\submission_cleaned.zip";
data test;
  infile zipfile(ajellowilliam_29616_1296089_MBTN_Peer_evalsGroup6.xlsx) recfm=f lrecl=1;
  file 'e:\Users\mhollifi\Documents\Dillon Study GP\18_3_9\ajellowilliam_29616_1296089_MBTN_Peer_evalsGroup6' recfm=f lrecl=1;
  input ch $char1.;
  put ch $char1.;
run;

ERROR: Entry ajellowilliam_29616_1296089_MBTN_Peer_evalsGroup6.xlsx in zip file
       e:\Users\mhollifi\Documents\Dillon Study GP\18_3_9\submission_cleaned.zip does not exist.
ERROR: Physical file does not exist, ajellowilliam_29616_1296089_MBTN_Peer_evalsGroup6.xlsx.
WARNING: The data set WORK.TEST may be incomplete.  When this step was stopped there were 0
         observations and 1 variables.
WARNING: Data set WORK.TEST was not replaced because this step was stopped.
11 REPLIES 11
Reeza
Super User

I don't think it's possible to read a zipped xlsx file. 

 

However, you usually don't get massive gains by zipping xlsx files because they're already compressed. If you change the extension of an xlsx file to zip and uncompress it you can see the xml components.

Tom
Super User Tom
Super User

You cannot read from an XLSX file as if it was a simple text file.

Are you trying to get the filenames?

Try something like this.

filename zipfile zip "e:\Users\mhollifi\Documents\Dillon Study GP\18_3_9\submission_cleaned.zip";
data files ;
 length fileno 8 filename $256 ;
 did=dopen('zipfile');
 do fileno=1 to dnum(did);
   filename=dread(did,fileno);
   output;
 end;
 did=dclose(did);
 drop did;
run;
mhollifi
Obsidian | Level 7

Thank you for your reply.  I tried to use your code, and I got this error message. Why?

filename zipfile zip "e:\Users\mhollifi\Documents\Dillon Study GP\18_3_9\submission_cleaned.zip";
data _null_ ;
 length fileno 8 filename $256 ;
 did=dopen('zipfile');
 %do fileno=1 %to dnum(did);

ERROR: The %DO statement is not valid in open code.


   filename=dread(did,fileno);
   output;
    %readdata(filename)

MPRINT(READDATA):   filename zipfile clear;
MPRINT(READDATA):   proc import datafile="filename" out=want DBMS=EXCEL REPLACE;
MPRINT(READDATA):   AEXC;
MPRINT(READDATA):   RANGE="Sheet1$";
MPRINT(READDATA):   GETNAMES=YES;
MPRINT(READDATA):   MIXED=no;
MPRINT(READDATA):   SCANTEXT=YES;
MPRINT(READDATA):   USEDATE=YES;
MPRINT(READDATA):   SCANTIME=YES;
MPRINT(READDATA):   RUN;
ERROR: Unable to open file C:\Users\00004263\filename.XLS. It does not  exist or it is already
       opened exclusively by another user, or you need permission to view its data.
ERROR: The %END statement is not valid in open code.


 %end;
 did=dclose(did);
 drop did;

run;
%macro readdata(xlsfile);
filename zipfile clear;

proc import datafile="&xlsfile" out=want
     DBMS=EXCEL REPLACE;
     RANGE="Sheet1$"; 
     GETNAMES=YES;
     MIXED=no;
     SCANTEXT=YES;
     USEDATE=YES;
     SCANTIME=YES;
RUN;

%mend;
Tom
Super User Tom
Super User
Why would you use a %DO loop in the middle of a data step?
Tom
Super User Tom
Super User

You will need to copy the files out of the ZIP file before you can use them with XLSX libname engine (or PROC IMPORT).

Let's make a macro to copy one file and import all of the sheets into a SAS libref.

%macro import_xlsx(zipfile=,member=,outlib= );
%local filen dlm ;
%if "&sysscp" = "WIN" %then %let dlm=\ ;
%else %let dlm=/ ;
%let filen=%sysfunc(quote(%qsysfunc(pathname(work))&dlm.copy.xlsx));
filename file1 zip &zipfile member=&member recfm=n;
filename file2 &filen recfm=n;
data _null_;
  infile file1 recfm=n ;
  file file2 recfm=n ;
  input;
  put _infile_;
run;
libname xlsx xlsx &filen ;

proc copy inlib=xlsx outlib=&outlib;
run;

libname xlsx clear ;
data _null_;
  fileref='file2';
  rc=fdelete(fileref);
run;
filename file2 ;
filename file1 ;

%mend import_xlsx;

So now if you take the list of files you generated using the earlier data step you can use it to generate a series of calls to this macro.

filename code ;
data _null_;
  set files ;
  file code ;
  put '%import_xlsx(zipfile="zipfilename_goes_here"'
      ',member=' filename :$quote.
      ',outlib=work)'
  ;
run;
%include code / source2 ;
mhollifi
Obsidian | Level 7

Thank you for your reply.

But hum...

Not quite sure what's going on to me.

I got this error message and I cannot figure out where "filename" variable comes from.

%local filen dlm ;
%if "&sysscp" = "WIN" %then %let dlm=\ ;
%else %let dlm=/ ;
%let filen=%sysfunc(quote(%qsysfunc(pathname(work))&dlm.copy.xlsx));
filename file1 zip &zipfile member=&member recfm=n;
filename file2 &filen recfm=n;
data _null_;
  infile file1 recfm=n ;
  file file2 recfm=n ;
  input;
  put _infile_;
run;
libname xlsx xlsx &filen ;

proc copy inlib=xlsx outlib=&outlib;
run;

libname xlsx clear ;
data _null_;
  fileref='file2';
  rc=fdelete(fileref);
run;
filename file2 ;
filename file1 ;

%mend import_xlsx;
*So now if you take the list of files you generated using the earlier data step you can use it to generate a series of calls to this macro.
filename code ;
data _null_;
  set files ;
  file code ;
  put '%import_xlsx(zipfile="e:\Users\mhollifi\Documents\Dillon Study GP\18_3_9\submission_cleaned.zip"'
      ',member=' filename :$quote.
      ',outlib=work)'
  ;
run;
%include code / source2 ;

and error message is

WARNING: Physical file does not exist, C:\Users\00004263\code.sas.
ERROR: Cannot open %INCLUDE file CODE.

in the end of line.

Tom
Super User Tom
Super User

If you didn't create a dataset with the list of the files in the ZIP file then your data step to generate one macro call per filename will not generate ANY lines into the CODE file.  So of course there are no lines to %INCLUDE from the file.


Make sure to FIRST run the code I posted before for reading the names of the files that are in your ZIP file. And make sure not to mistakenly convert the data step DO loops into macro code %DO loops.

 

Also make sure that the name of the variable you created to hold the filename matches the name of the variable you are using to write the macro call into the text file CODE.

ChrisNZ
Tourmaline | Level 20

Also you should replace this data step

filename file1 zip &zipfile member=&member recfm=n;
filename file2 &filen recfm=n;
data _null_;
  infile file1 recfm=n ;
  file file2 recfm=n ;
  input;
  put _infile_;
run;

and use function fcopy instead.

Tom
Super User Tom
Super User

@mhollifi wrote:

...

I got this error message and I cannot figure out where "filename" variable comes from.

*So now if you take the list of files you generated using the earlier data step you can use it to generate a series of calls to this macro.
filename code ;
data _null_;
  set files ;
  file code ;
  put '%import_xlsx(zipfile="e:\Users\mhollifi\Documents\Dillon Study GP\18_3_9\submission_cleaned.zip"'
      ',member=' filename :$quote.
      ',outlib=work)'
  ;
run;
%include code / source2 ;

and error message is

WARNING: Physical file does not exist, C:\Users\00004263\code.sas.
ERROR: Cannot open %INCLUDE file CODE.

in the end of line.


Since you commented out the FILENAME statement the fileref of CODE was never defined. When you are writing to a file with the FILE statement in a datastep SAS will use a default extension of .dat.  But when you are using %INCLUDE to include a file of SAS code then SAS will use a default extension of .sas, so the %INCLUDE is not finding the file that the data step wrote.

 

So either add a semi-colon to the end of your comment so that the FILENAME statement will execute.

Or use a quoted physical filename in the FILE and %INCLUDE statements instead of a fileref.

 

PS Yet another reason why code lines should be limited to 75 characters.  You are more likely to notice lines that are missing the ending semi-colon.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1815 views
  • 0 likes
  • 5 in conversation