BookmarkSubscribeRSS Feed
AndrewZ
Quartz | Level 8

In a SAS Unicode session, data from Snowflake is coming across as iso-8859-1, even though the SAS data set marks the data set as Unicode.

 

I connect using

libname mysf odbc
complete="DRIVER={SnowflakeDSIIDriver};SERVER=...";

I can use PROC DATASETS with CORRECTENCODING to change the data set to iso-8559-1, and then compatible characters (like Spanish and German) look okay. However, non-compatible characters (like Japanese) are always mojibaked or missing.

 

 

I am on SAS Unicode 32-bit and Windows 10 Enterprise .

 

NOTE: SAS (r) Proprietary Software 9.4 (TS1M7 MBCS3170)
NOTE: This session is executing on the W32_10PRO platform.

 

Snowflake 32-bit ODBC driver: 2.25.04.00

 

I would rather not pay for a Snowflake connector license just for this one feature. 

32 REPLIES 32
LinusH
Tourmaline | Level 20

I'm bit confused since Snowflake claims that they always have the data transferred in UTF-8 using their ODBC driver.

https://community.snowflake.com/s/article/Does-the-Snowflake-ODBC-Driver-support-non-Latin-character...

 

What is the actual ENCODING setting in your SAS session?

Make sure that you download the data to a clean location to avoid encoding miss-match.

Data never sleeps
AndrewZ
Quartz | Level 8

@LinusH 

"proc options option=encoding; run;" tells me my encoding is UTF-8, and when I check the properties of the SAS data set under WORK (created by querying Snowflake), the encoding is UTF-8 too.

 

What do you mean a clean location? I am creating a v9 SAS data in WORK. 

 

LinusH
Tourmaline | Level 20
Yes, that's is as clean as it gets.
Following @Tom here, SAS Support could be your next step.
Another thing to try is to open this data from another application on your computer, like Excel and compare.
Data never sleeps
Tom
Super User Tom
Super User

Sounds like the data in the Snowflake database is not UTF-8?

Try a small query with some characters that you know are non-7bit ASCII characters and print the actual hex codes.

data test;
  set mysf.have(encoding=any);
  put (_character_) (=$hex. /);
run;

 

AndrewZ
Quartz | Level 8

I created a Unicode table in Snowflake via DBeaver with this code

create or replace TABLE zzz_unicode ( language_code int, text varchar(100) );
insert into zzz_unicode (language_code, text) values
(1, 'Ich kann Glasssplitter essen, es tut mir nicht weh'),
(2, 'Je peux manger du verre, ça ne me fait pas mal'),
(3, 'Posso mangiare vetro, non mi fa male'),
(4, 'Eu posso comer vidro, não me faz mal'),
(5, 'Puedo comer vidrio, no me hace daño'),
(6, 'Я могу есть битое стекло, оно мне не вредит'),
(7, 'ฉันสามารถกินแก้วแตกได้ มันไม่ทำให้ฉันเจ็บปวด'),
(8, '私は割れたガラスを食べることができます、それは私を傷つけません'),
(9, 'እኔ የተሰነጠቀ ብረት መብላት እችላለሁ፣ አይጎዳኝም'),
(10, 'ငါ ብስጭት መብላት እችላለሁ, ጎጂ አይደለም');
insert into zzz_unicode (language_code, text) values
(11, 'Я могу есть битое стекло, оно мне не вредит'),
(12, '私は割れたガラスを食べることができます、それは私を傷つけません'),
(13, '我可以吃碎玻璃,它不会伤害我');

select *
from zzz_unicode
;

This shows how the table looks in DBeaver (looks good) vs SAS PROC PRINT (looks bad).

I can eat broken glass.png

 

Here is the SAS log for the HEX test you suggested. The character 1A is SUB, and 20 is space, so we see the mojibake in HEX too. Also, it doesn't like encoding=any.

 

348
349 data test;
350 set jet.zzz_unicode(encoding=any);
--------
76
WARNING 76-63: The option ENCODING is not implemented in the ODBC engine.

351 if language_code in (6,13);
352 put (text) (=$hex. /);
353 run;

TEXT=1A201A1A1A1A201A1A1A1A201A1A1A1A1A201A1A1A1A1A1A2C201A1A1A201A1A1A201A1A201A1A1A1A1A1A20202020202020202020202020202020202020202020202020202020202
0202020202020202020202020202020202020202020202020202020
TEXT=1A1A1A1A1A1A1A1A1A1A1A1A1A1A202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202
0202020202020202020202020202020202020202020202020202020
AndrewZ
Quartz | Level 8

@Tom 

How would I change the ODBC definition for Snowflake? The URL you posted was about PostgreSQL, and in the last comment, the author (not the original poster) wrote that he solved by adding "SET CLIENT_ENCODING TO 'UTF8'" to his connection string, but I see no such option for Snowflake.

 

@LinusH posted a link to Snowflake documentation that states: "the Snowflake ODBC Driver will always use UTF-8 encoding for any STRING/VARCHAR/VARIANT type column data." This is what I need, and I can't change it. Also, I do not see any encoding related options in Snowflake session parameters or ODBC parameters. This is consistent with the documentation the the "ODBC driver will always use UTF-8" because there is no way to change it.

 

Because Snowflake always uses UTF-8 leads me to conclude that the problem, whether SAS configuration or SAS bug, is with SAS's ODBC integration.

Tom
Super User Tom
Super User

Did you raise a ticket with SAS support?

I am also surprised to see that you are using 32-bit version of SAS.

Do you have access to 64-bit version of SAS and 64-bit ODBC drivers for snowflake that you can test?

Tom
Super User Tom
Super User

The other work around is to dump the data to a text file and then read the text file using a data step.

 

Looks like Snowflake has a COPY TO command for generating text files from data.

https://docs.snowflake.com/en/sql-reference/sql/copy-into-location

 

Steps would be:
1) Generate delimited text file.

2) Make the text file available to SAS. Might involve copying the file from S3 to work directory.

2) Create dataset with same structure as original table (could be as simple as using OBS=0 dataset option) and then read in the text file.

* Step 3 ;
data want;
   if 0 then set snow.mytable (obs=0);
   infile csv dsd truncover ;
   input (_all_) (+0);
run;

For large files you might find that it works much faster than trying to suck data though that tiny ODBC straw.

AndrewZ
Quartz | Level 8

OK, I opened a ticket with SAS support. Let's see how that goes.

 

The COPY TO is a creative suggestion. I already do the opposite for bulk loading data into Snowflake, so if I get desperate, I will check into that.

 

I could also write a Python program to read the data from Snowflake and pass it to SAS. Python can write SAS format via sas7bdat, but I am guessing it doesn't support Unicode. Maybe XLSX or Access would be the easiest way. This doesn't seem fun, either.

Quentin
Super User

Interesting thread, thanks.  Looking forward to hearing what SAS support says.  Also just want to say, XLSX is never the answer, and it's usually the problem. 😀

 

To the big picture, have you tried using snowflake's ODBC driver with some other client (e.g. python, or, ok, even Excel), to see what comes down? If other clients have the same problem, then I think you've got to point back to snowflake support. Just because their docs say they always serve UTF-8 doesn't mean they are.

The Boston Area SAS Users Group is hosting free webinars!
Next up: Bart Jablonski and I present 53 (+3) ways to do a table lookup on Wednesday Sep 18.
Register now at https://www.basug.org/events.
Quentin
Super User

Any response from tech Support on this? I'm having the same issue (I think).   Running SAS 9.4M7 (64-bit) on Windows.

The Boston Area SAS Users Group is hosting free webinars!
Next up: Bart Jablonski and I present 53 (+3) ways to do a table lookup on Wednesday Sep 18.
Register now at https://www.basug.org/events.
AndrewZ
Quartz | Level 8

I've pursued this with official SAS support, Snowflake support, unofficial channels like this online forum, and our IT department. SAS and Snowflake pointed fingers at each other, and I haven't gotten anywhere.

 

Based on some tests like the Microsoft ODBC Test utility in the MDAC package, the problem seems to be that the Snowflake driver never returns UTF-8, despite what Snowflake documentation states. This low-level utility is not user friendly, but easier ways to demonstrate that the problem is with the Snowflake ODBC driver are to use other ODBC clients such as Microsoft Excel or Microsoft Access. 

 

The test procedure is simple: just create a table with a little bit of data, and then use a client like Excel to query it.

 

create or replace TABLE zzz_unicode ( language_code int, text varchar(100) );

insert into zzz_unicode (language_code, text) values

(1, 'Ich kann Glasssplitter essen, es tut mir nicht weh'),

(2, 'Je peux manger du verre, ça ne me fait pas mal'),

(3, 'Posso mangiare vetro, non mi fa male'),

(4, 'Eu posso comer vidro, não me faz mal'),

(5, 'Puedo comer vidrio, no me hace daño'),

(6, 'Я могу есть битое стекло, оно мне не вредит'),

(7, 'ฉันสามารถกินแก้วแตกได้ มันไม่ทำให้ฉันเจ็บปวด'),

(8, '私は割れたガラスを食べることができます、それは私を傷つけません'),

(9, 'እኔ የተሰነጠቀ ብረት መብላት እችላለሁ፣ አይጎዳኝም'),

(10, 'ငါ ብስጭት መብላት እችላለሁ, ጎጂ አይደለም');

insert into zzz_unicode (language_code, text) values

(11, 'Я могу есть битое стекло, оно мне не вредит'),

(12, '私は割れたガラスを食べることができます、それは私を傷つけません'),

(13, '我可以吃碎玻璃,它不会伤害我');

 

select *

from zzz_unicode

;

 

If you make progress, let me know please.

Quentin
Super User

Thanks much for your response.  Really appreciate it.  Your diagnostic steps sounds good to me.  Unfortunately I can't test myself, as I currently have only read access to a table that was recently migrated from SQL Server to Snowflake.  In my quick PROC COMPARE I noticed the problems caused by the encoding, even after I ran the query against snowflake in a SAS UTF-8 session.  Then google led me to your post.  Unfortunately, I have characters in many many multi-byte languages, so sounds like the iso-8859-1 encoding won't fix it all.  I'll probably open up a ticket with the internal IT group that owns the snowflake instance, but they'll probably just tell me it's a SAS problem and close the ticket, because they don't know SAS.  : )   In the end it's okay, I don't really need the character strings that have issues, was just hoping for an easy way to make the PROC COMPARE return 0 differences.  Thanks again.

The Boston Area SAS Users Group is hosting free webinars!
Next up: Bart Jablonski and I present 53 (+3) ways to do a table lookup on Wednesday Sep 18.
Register now at https://www.basug.org/events.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 32 replies
  • 1972 views
  • 15 likes
  • 6 in conversation