- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In a SAS Unicode session, data from Snowflake is coming across as iso-8859-1, even though the SAS data set marks the data set as Unicode.
I connect using
libname mysf odbc complete="DRIVER={SnowflakeDSIIDriver};SERVER=...";
I can use PROC DATASETS with CORRECTENCODING to change the data set to iso-8559-1, and then compatible characters (like Spanish and German) look okay. However, non-compatible characters (like Japanese) are always mojibaked or missing.
I am on SAS Unicode 32-bit and Windows 10 Enterprise .
NOTE: SAS (r) Proprietary Software 9.4 (TS1M7 MBCS3170)
NOTE: This session is executing on the W32_10PRO platform.
Snowflake 32-bit ODBC driver: 2.25.04.00
I would rather not pay for a Snowflake connector license just for this one feature.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm bit confused since Snowflake claims that they always have the data transferred in UTF-8 using their ODBC driver.
What is the actual ENCODING setting in your SAS session?
Make sure that you download the data to a clean location to avoid encoding miss-match.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
"proc options option=encoding; run;" tells me my encoding is UTF-8, and when I check the properties of the SAS data set under WORK (created by querying Snowflake), the encoding is UTF-8 too.
What do you mean a clean location? I am creating a v9 SAS data in WORK.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Following @Tom here, SAS Support could be your next step.
Another thing to try is to open this data from another application on your computer, like Excel and compare.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sounds like the data in the Snowflake database is not UTF-8?
Try a small query with some characters that you know are non-7bit ASCII characters and print the actual hex codes.
data test;
set mysf.have(encoding=any);
put (_character_) (=$hex. /);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I created a Unicode table in Snowflake via DBeaver with this code
create or replace TABLE zzz_unicode ( language_code int, text varchar(100) ); insert into zzz_unicode (language_code, text) values (1, 'Ich kann Glasssplitter essen, es tut mir nicht weh'), (2, 'Je peux manger du verre, ça ne me fait pas mal'), (3, 'Posso mangiare vetro, non mi fa male'), (4, 'Eu posso comer vidro, não me faz mal'), (5, 'Puedo comer vidrio, no me hace daño'), (6, 'Я могу есть битое стекло, оно мне не вредит'), (7, 'ฉันสามารถกินแก้วแตกได้ มันไม่ทำให้ฉันเจ็บปวด'), (8, '私は割れたガラスを食べることができます、それは私を傷つけません'), (9, 'እኔ የተሰነጠቀ ብረት መብላት እችላለሁ፣ አይጎዳኝም'), (10, 'ငါ ብስጭት መብላት እችላለሁ, ጎጂ አይደለም'); insert into zzz_unicode (language_code, text) values (11, 'Я могу есть битое стекло, оно мне не вредит'), (12, '私は割れたガラスを食べることができます、それは私を傷つけません'), (13, '我可以吃碎玻璃,它不会伤害我'); select * from zzz_unicode ;
This shows how the table looks in DBeaver (looks good) vs SAS PROC PRINT (looks bad).
Here is the SAS log for the HEX test you suggested. The character 1A is SUB, and 20 is space, so we see the mojibake in HEX too. Also, it doesn't like encoding=any.
348 349 data test; 350 set jet.zzz_unicode(encoding=any); -------- 76 WARNING 76-63: The option ENCODING is not implemented in the ODBC engine. 351 if language_code in (6,13); 352 put (text) (=$hex. /); 353 run; TEXT=1A201A1A1A1A201A1A1A1A201A1A1A1A1A201A1A1A1A1A1A2C201A1A1A201A1A1A201A1A201A1A1A1A1A1A20202020202020202020202020202020202020202020202020202020202 0202020202020202020202020202020202020202020202020202020 TEXT=1A1A1A1A1A1A1A1A1A1A1A1A1A1A202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202 0202020202020202020202020202020202020202020202020202020
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try tweaking the ODBC definition.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
How would I change the ODBC definition for Snowflake? The URL you posted was about PostgreSQL, and in the last comment, the author (not the original poster) wrote that he solved by adding "SET CLIENT_ENCODING TO 'UTF8'" to his connection string, but I see no such option for Snowflake.
@LinusH posted a link to Snowflake documentation that states: "the Snowflake ODBC Driver will always use UTF-8 encoding for any STRING/VARCHAR/VARIANT type column data." This is what I need, and I can't change it. Also, I do not see any encoding related options in Snowflake session parameters or ODBC parameters. This is consistent with the documentation the the "ODBC driver will always use UTF-8" because there is no way to change it.
Because Snowflake always uses UTF-8 leads me to conclude that the problem, whether SAS configuration or SAS bug, is with SAS's ODBC integration.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Did you raise a ticket with SAS support?
I am also surprised to see that you are using 32-bit version of SAS.
Do you have access to 64-bit version of SAS and 64-bit ODBC drivers for snowflake that you can test?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The other work around is to dump the data to a text file and then read the text file using a data step.
Looks like Snowflake has a COPY TO command for generating text files from data.
https://docs.snowflake.com/en/sql-reference/sql/copy-into-location
Steps would be:
1) Generate delimited text file.
2) Make the text file available to SAS. Might involve copying the file from S3 to work directory.
2) Create dataset with same structure as original table (could be as simple as using OBS=0 dataset option) and then read in the text file.
* Step 3 ;
data want;
if 0 then set snow.mytable (obs=0);
infile csv dsd truncover ;
input (_all_) (+0);
run;
For large files you might find that it works much faster than trying to suck data though that tiny ODBC straw.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
OK, I opened a ticket with SAS support. Let's see how that goes.
The COPY TO is a creative suggestion. I already do the opposite for bulk loading data into Snowflake, so if I get desperate, I will check into that.
I could also write a Python program to read the data from Snowflake and pass it to SAS. Python can write SAS format via sas7bdat, but I am guessing it doesn't support Unicode. Maybe XLSX or Access would be the easiest way. This doesn't seem fun, either.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Interesting thread, thanks. Looking forward to hearing what SAS support says. Also just want to say, XLSX is never the answer, and it's usually the problem. 😀
To the big picture, have you tried using snowflake's ODBC driver with some other client (e.g. python, or, ok, even Excel), to see what comes down? If other clients have the same problem, then I think you've got to point back to snowflake support. Just because their docs say they always serve UTF-8 doesn't mean they are.
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Any response from tech Support on this? I'm having the same issue (I think). Running SAS 9.4M7 (64-bit) on Windows.
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I've pursued this with official SAS support, Snowflake support, unofficial channels like this online forum, and our IT department. SAS and Snowflake pointed fingers at each other, and I haven't gotten anywhere.
Based on some tests like the Microsoft ODBC Test utility in the MDAC package, the problem seems to be that the Snowflake driver never returns UTF-8, despite what Snowflake documentation states. This low-level utility is not user friendly, but easier ways to demonstrate that the problem is with the Snowflake ODBC driver are to use other ODBC clients such as Microsoft Excel or Microsoft Access.
The test procedure is simple: just create a table with a little bit of data, and then use a client like Excel to query it.
create or replace TABLE zzz_unicode ( language_code int, text varchar(100) ); insert into zzz_unicode (language_code, text) values (1, 'Ich kann Glasssplitter essen, es tut mir nicht weh'), (2, 'Je peux manger du verre, ça ne me fait pas mal'), (3, 'Posso mangiare vetro, non mi fa male'), (4, 'Eu posso comer vidro, não me faz mal'), (5, 'Puedo comer vidrio, no me hace daño'), (6, 'Я могу есть битое стекло, оно мне не вредит'), (7, 'ฉันสามารถกินแก้วแตกได้ มันไม่ทำให้ฉันเจ็บปวด'), (8, '私は割れたガラスを食べることができます、それは私を傷つけません'), (9, 'እኔ የተሰነጠቀ ብረት መብላት እችላለሁ፣ አይጎዳኝም'), (10, 'ငါ ብስጭት መብላት እችላለሁ, ጎጂ አይደለም'); insert into zzz_unicode (language_code, text) values (11, 'Я могу есть битое стекло, оно мне не вредит'), (12, '私は割れたガラスを食べることができます、それは私を傷つけません'), (13, '我可以吃碎玻璃,它不会伤害我'); select * from zzz_unicode ;
If you make progress, let me know please.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks much for your response. Really appreciate it. Your diagnostic steps sounds good to me. Unfortunately I can't test myself, as I currently have only read access to a table that was recently migrated from SQL Server to Snowflake. In my quick PROC COMPARE I noticed the problems caused by the encoding, even after I ran the query against snowflake in a SAS UTF-8 session. Then google led me to your post. Unfortunately, I have characters in many many multi-byte languages, so sounds like the iso-8859-1 encoding won't fix it all. I'll probably open up a ticket with the internal IT group that owns the snowflake instance, but they'll probably just tell me it's a SAS problem and close the ticket, because they don't know SAS. : ) In the end it's okay, I don't really need the character strings that have issues, was just hoping for an easy way to make the PROC COMPARE return 0 differences. Thanks again.
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.