I will import an xlsx file from an zip-file created in windows to a Viya/Linux environment. I am folowing Chris Hemedingers excelent blog, Using FILENAME ZIP to unzip and read data files in SAS - The SAS Dummy, and it works perfect when I using my windows installation on workstation. The same code does not work in SAS Studio on Viya.
The obvious prolem is the files within the zip-file uses character åäö. I tried to use the options encoding="wlatin1" in the filename statement, but Viya ignore it, or I have done it in the wrong way.
There was another problem, "filename" does not accept path with spaces such as "Region Skåne konkurser".
filename inzip ZIP "/opt/sas/spre/config/samhallsdata/indata/RegionSkaneKonkurser/dia_konkurser_v1_2020_v6_2021.zip" encoding='wlatin1'; run; data contents(keep=memname isFolder); length memname $200 isFolder 8; fid=dopen("inzip"); if fid=0 then stop; memcount=dnum(fid); do i=1 to memcount; memname = dread(fid,i); isFolder = (first(reverse(trim(memname)))='/'); output; end; rc=dclose(fid); run; title "Files in the ZIP file"; proc print data=contents noobs N; run; title; run; filename xlClan "%sysfunc(getoption(work))/dataunderlag_konkurser_Uppsala lan.xlsx"; data _null_; infile inzip(dia_konkurser/dia_konk_Uppsala län/dataunderlag_konkurser_Uppsala län.xlsx) lrecl=256 recfm=F length=length eof=eof unbuf; file xlClan lrecl=256 recfm=N; input; put _infile_ $varying256. length; return; eof: stop; run; proc import datafile=xlClan out=confirmed replace dbms=xlsx; run;
Just found the NAMEENCODING option that might be useful:
specifies the encoding to use for ZIP file entry names and comments. The value for NAMEENCODING= indicates that the entry name and comment have a different encoding from the current session encoding.
Default | Code Page 437 |
---|---|
Example |
|
See the SAS documentation for a reference table of available encodings.
I don't think the ENCODING option would apply to FILENAME ZIP method, as it's a binary file.
Is your SAS session running using ENCODING='utf-8'? Actually, I think that's the default in SAS Viya anyway.
I always recommend that -- I believe that the innards of Excel files (xlsx) are UTF-8 encoded, and UTF-8 allows the most flexibility for managing character content.
I have made a little test-file with one file named åäö.txt.
Here is the code.
filename yxzip ZIP "/opt/sas/spre/config/samhallsdata/indata/RegionSkaneKonkurser/testdoc.zip"; run; data work.contents(encoding='wlatin1'); keep memname isFolder; length memname $200 isFolder 8; fid=dopen("yxzip"); if fid=0 then stop; memcount=dnum(fid); put memcount=; do i=1 to memcount; put i=; memname = dread(fid,i); put memname=; isFolder = (first(reverse(trim(memname)))='/'); output; end; rc=dclose(fid); run; title "Files in the ZIP file"; proc print data=contents noobs N; run; title;
And i got this log
82 83 data work.contents(encoding='wlatin1'); 84 keep memname isFolder; 85 length memname $200 isFolder 8; 86 fid=dopen("yxzip"); 87 if fid=0 then stop; 88 memcount=dnum(fid); 89 put memcount=; 90 do i=1 to memcount; 91 put i=; 92 memname = dread(fid,i); 93 put memname=; 94 isFolder = (first(reverse(trim(memname)))='/'); 95 output; 96 end; 97 rc=dclose(fid); 98 run; NOTE: Data file WORK.CONTENTS.DATA is in a format that is native to another host, or the file encoding does not match the session encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce performance. memcount=2 i=1 memname=testdoc.sas i=2 memname=���.txt ERROR: Some character data was lost during transcoding in the dataset WORK.CONTENTS. Either the data contains characters that are not representable in the new encoding or truncation occurred during transcoding. NOTE: The DATA step has been abnormally terminated. NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set WORK.CONTENTS may be incomplete. When this step was stopped there were 1 observations and 2 variables. WARNING: Data set WORK.CONTENTS was not replaced because this step was stopped. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds
Just found the NAMEENCODING option that might be useful:
specifies the encoding to use for ZIP file entry names and comments. The value for NAMEENCODING= indicates that the entry name and comment have a different encoding from the current session encoding.
Default | Code Page 437 |
---|---|
Example |
|
See the SAS documentation for a reference table of available encodings.
Thanks!
NAMEENCODING="Cp437"
make it.
Hi
What other nameenciding values are available? List somewhere?
@HannuSihvonen There is a reference table in the SAS documentation.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.