BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

I will import an xlsx file from an zip-file created in windows to a Viya/Linux environment. I am folowing Chris Hemedingers excelent blog, Using FILENAME ZIP to unzip and read data files in SAS - The SAS Dummy, and it works perfect when I using my windows installation on workstation. The same code does not work in SAS Studio on Viya.

The obvious prolem is the files within the zip-file uses character åäö. I tried to use the options encoding="wlatin1" in the filename statement, but Viya ignore it, or I have done it in the wrong way.

There was another problem, "filename" does not accept path with spaces such as "Region Skåne konkurser".

 

 

filename inzip ZIP "/opt/sas/spre/config/samhallsdata/indata/RegionSkaneKonkurser/dia_konkurser_v1_2020_v6_2021.zip" encoding='wlatin1';
run;
data contents(keep=memname isFolder);
	length memname $200 isFolder 8;
	fid=dopen("inzip");
	if fid=0 then stop;
	memcount=dnum(fid);
	do i=1 to memcount;
		memname = dread(fid,i);
		isFolder = (first(reverse(trim(memname)))='/');
		output;
	end;
	rc=dclose(fid);
run;
title "Files in the ZIP file";
proc print data=contents noobs N;
run;
title;
run;
filename xlClan "%sysfunc(getoption(work))/dataunderlag_konkurser_Uppsala lan.xlsx";

data _null_;
	infile inzip(dia_konkurser/dia_konk_Uppsala län/dataunderlag_konkurser_Uppsala län.xlsx)
		lrecl=256 recfm=F length=length eof=eof unbuf;
	file xlClan lrecl=256 recfm=N;
	input;
	put _infile_ $varying256. length;
	return;
eof:
	stop;
run;

proc import datafile=xlClan out=confirmed replace dbms=xlsx;
run;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisHemedinger
Community Manager

Just found the NAMEENCODING option that might be useful:

 

Doc here.

 

NAMEENCODING=encoding-value

specifies the encoding to use for ZIP file entry names and comments. The value for NAMEENCODING= indicates that the entry name and comment have a different encoding from the current session encoding.

Default Code Page 437
Example
filename zs zip "yxz.zip" nameencoding=sjis member="s" termstr=lf;

 

 

See the SAS documentation for a reference table of available encodings.

SAS Hackathon registration is open! Build your skills. Make connections. Enjoy creative freedom. Maybe change the world.

View solution in original post

8 REPLIES 8
ChrisHemedinger
Community Manager

I don't think the ENCODING option would apply to FILENAME ZIP method, as it's a binary file.

 

Is your SAS session running using ENCODING='utf-8'?  Actually, I think that's the default in SAS Viya anyway.

 

I always recommend that -- I believe that the innards of Excel files (xlsx) are UTF-8 encoded, and UTF-8 allows the most flexibility for managing character content.

SAS Hackathon registration is open! Build your skills. Make connections. Enjoy creative freedom. Maybe change the world.
AndersBergquist
Quartz | Level 8
My problem is that SAS read the ZIP content directory as utf-8, even if it is wlatin1, But maybe You point me to the rigth directions.
Do You think it will work if I change the two first lines in the first datastep to
data work.contents(encoding='wlatin1');
keep memname isFolder;
AndersBergquist
Quartz | Level 8
No, it does not work. Error.
There is aproblem then SAS reads the content of zip-files with non US-characters.
AndersBergquist
Quartz | Level 8

I have made a little test-file with one file named åäö.txt.

Here is the code.

filename yxzip ZIP "/opt/sas/spre/config/samhallsdata/indata/RegionSkaneKonkurser/testdoc.zip";
run;
data work.contents(encoding='wlatin1');
	keep memname isFolder;
	length memname $200 isFolder 8;
	fid=dopen("yxzip");
	if fid=0 then stop;
	memcount=dnum(fid);
put memcount=;
	do i=1 to memcount;
put i=;
		memname = dread(fid,i);
put memname=;
		isFolder = (first(reverse(trim(memname)))='/');
		output;
	end;
	rc=dclose(fid);
run;
title "Files in the ZIP file";
proc print data=contents noobs N;
run;
title;

And i got this log

82   
83   data work.contents(encoding='wlatin1');
84   keep memname isFolder;
85   length memname $200 isFolder 8;
86   fid=dopen("yxzip");
87   if fid=0 then stop;
88   memcount=dnum(fid);
89   put memcount=;
90   do i=1 to memcount;
91   put i=;
92   memname = dread(fid,i);
93   put memname=;
94   isFolder = (first(reverse(trim(memname)))='/');
95   output;
96   end;
97   rc=dclose(fid);
98   run;
NOTE: Data file WORK.CONTENTS.DATA is in a format that is native to another host, or the file encoding does not match the session 
      encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce 
      performance.
memcount=2
i=1
memname=testdoc.sas
i=2
memname=���.txt
ERROR: Some character data was lost during transcoding in the dataset WORK.CONTENTS. Either the data contains characters that are 
       not representable in the new encoding or truncation occurred during transcoding.
NOTE: The DATA step has been abnormally terminated.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.CONTENTS may be incomplete.  When this step was stopped there were 1 observations and 2 variables.
WARNING: Data set WORK.CONTENTS was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

 

ChrisHemedinger
Community Manager

Just found the NAMEENCODING option that might be useful:

 

Doc here.

 

NAMEENCODING=encoding-value

specifies the encoding to use for ZIP file entry names and comments. The value for NAMEENCODING= indicates that the entry name and comment have a different encoding from the current session encoding.

Default Code Page 437
Example
filename zs zip "yxz.zip" nameencoding=sjis member="s" termstr=lf;

 

 

See the SAS documentation for a reference table of available encodings.

SAS Hackathon registration is open! Build your skills. Make connections. Enjoy creative freedom. Maybe change the world.
AndersBergquist
Quartz | Level 8

Thanks!

NAMEENCODING="Cp437"

make it.

HannuSihvonen
Calcite | Level 5

Hi

What other nameenciding values are available? List somewhere?

 

ChrisHemedinger
Community Manager

@HannuSihvonen There is a reference table in the SAS documentation.

SAS Hackathon registration is open! Build your skills. Make connections. Enjoy creative freedom. Maybe change the world.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 2342 views
  • 1 like
  • 3 in conversation