BookmarkSubscribeRSS Feed
SASKiwi
PROC Star

Is your session encoding UTF-8?

proc options option = encoding;
run;
AndrewZ
Quartz | Level 8

Quentin, I did extensive testing on this issue in SAS and other tools. In SAS, I used SAS Unicode (utf-8) and SAS English (wlatin1).

 

My workaround in SAS Unicode is to run PROC DATASETS like below every time I pull in data from Snowflake, but it only gives me iso-8859-1, which seems to be a limitation of the Snowflake ODBC driver.

 

proc datasets library=&lib noprint;
modify &ds / correctencoding='iso-8859-1';
quit;

Tom
Super User Tom
Super User

@AndrewZ wrote:

Quentin, I did extensive testing on this issue in SAS and other tools. In SAS, I used SAS Unicode (utf-8) and SAS English (wlatin1).

 

My workaround in SAS Unicode is to run PROC DATASETS like below every time I pull in data from Snowflake, but it only gives me iso-8859-1, which seems to be a limitation of the Snowflake ODBC driver.

 

proc datasets library=&lib noprint;
modify &ds / correctencoding='iso-8859-1';
quit;


I am not sure what your second paragraph means.

 

Did you look at the hexcodes in the dataset?  Were they the valid UTF-8 bytes you expected? 

 

That PROC DATASETS code will just change the metadata attribute that indicates the encoding used to create the file.  Changing the metadata about the encoding of the text in the dataset will not change what is in the dataset.  It just tells future users of the data what to expect to find when they look at the data.

AndrewZ
Quartz | Level 8

Did you look at the hexcodes in the dataset?  Were they the valid UTF-8 bytes you expected? 

 

You mean use a hex editor on the .sas7bdat file? No. Based on my other tests (like one in the next paragraph), Snowflake was not sending UTF-8.

 

That PROC DATASETS code will just change the metadata attribute that indicates the encoding used to create the file.  Changing the metadata about the encoding of the text in the dataset will not change what is in the dataset.  It just tells future users of the data what to expect to find when they look at the data.

 
Yes, I understand. The PROC DATASETS step fixed encoding for Spanish characters (like á, é, í, ó, ú, ñ), German characters ( like ä, ö, ü), and "smart" quotation marks usually made by Microsoft Office, but not other texts like Korean, so that implies Snowflake ODBC was sending the text as iso-8859-1 instead of utf-8.
 
If Snowflake ODBC were sending UTF-8, PROC DATASETS would not have this effect in SAS.
AhmedAl_Attar
Rhodochrosite | Level 12

@AndrewZ 

Have you tried this Libname option?

DBCLIENT_MAX_BYTES= LIBNAME Statement Option 

 

One think to keep in mind, Variable length in Snowflake are based on character count, while in SAS, they are based on Byte count!

Therefore what could be stored in Snowflake within a char/varchar (1) may require a SAS variable of length 2+ in order to correctly display the values.

 

Hope this helps

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 19 replies
  • 1369 views
  • 5 likes
  • 6 in conversation