Administration and Deployment

psgard · Posted 11-24-2020 05:30 PM

Hello,

I work in SAS GRID 9.4 on a Redhat 7 Linux platform.

I want to access to Snowflake Data with a SAS ACCESS TO ODBC connector.

Snowflake Encoding : UTF-8

Linux encoding (echo $LANG) : en_US.UTF-8

SAS Platform encoding (proc options group=languagecontrol; run;) : ENCODING=LATIN1 / LOCALE=EN_US

SAS ACCESS TO ODBC doesn't manage option encoding.

When I read a database with special UTF-8 characters (é,è,à….), i receive wrong data,

for example:"Contrat Ã durÃ©e indÃ©terminÃ©e"

I want to receive "Contrat à durée indéterminée".

How do i configure my ODBC to obtain this result for all users on my SAS platform, please ?

Thanks,

Best regards.

MargaretC · Posted 11-24-2020 06:53 PM

When using the SAS/ACCESS to ODBC product, you use an ODBC driver from Snowflake. If the data is not coming back correctly via the ODBC driver, you may want to reach out to Snowflake to make sure you have the correct driver.

Margaret

psgard · Posted 11-25-2020 03:11 AM

Hello MargaretC,

I have already checked this part with isql directly on the Linux server.

I think that the trouble is "between" SAS and the driver Linux Snowflake.

result of the isql query:

SQL> select CLBR_CTRA_TYPE_LB from BPFR_MEDIUM.BPFR_MEDIUM_VIEW.DIM_CLBR where clbr_sk = 11036 and tec_current_in = 1;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| CLBR_CTRA_TYPE_LB |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Contrat à durée indéterminée |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
SQLRowCount returns 1
1 rows fetched

Tom · Posted 11-25-2020 08:19 AM

If your SAS session is using LATIN1 and you are trying to read UTF-8 characters you are really stuck. LATIN1 is a single byte encoding. So there are only 256 possible codes for characters. UTF-8 can use up to 4 bytes for a character. So unless your input data only uses the original 7 bit ASCII codes there is very good probability that you will encounter some UTF-8 character than cannot be represented with LATIN1 encoding.

Change how you start SAS so that it is using UTF-8 encoding and re-test your connection.

SASKiwi · Posted 11-24-2020 07:24 PM

I'm pretty sure you will need to change your SAS session encoding to match the database and that has to be done at startup usually in a SAS config. Depending on your SAS client interface, further locale / encoding tweaking might be required. Enterprise Guide allows you to align to the server locale which might be helpful.

psgard · Posted 11-25-2020 03:21 AM

Hello,

All SAS Databases in our Platform are in LATIN1, if i modify the SAS session encoding in the file: /sas/software/SASFoundation/9.4/bin/sasenv_local

won't there be another problems reading my existing SAS databases or joining/merging sas database and snowflake database (in proc sql for example) ?

SASKiwi · Posted 11-25-2020 07:10 PM

AFAIK, UTF-8 encoding includes LATIN languages. Why don't you test it out temporarily?

There are other NLS settings that may help: refer to the NLS Guide

jklaverstijn · Posted 11-26-2020 04:43 AM

Don't think that switching from LATIN to UTF-8 is a matter of simply changing the encoding of your SAS session!

Your existing datasets would still be LATIN. This is almost as if your data is from a different operating system. You would run into all sorts of limitations and issues.Some examples:

Copying a table that has the name "André" in a CHAR(5) would no longer fit in the UTF-8 world, leading to truncation and error messages (real world example).

The function call length("André)" would give different results between LATIN and UTF-8. Therefore the KLENGTH() function (and more generally, a whole slew of K*() functions) are available.to deal with the now new discrepancy between length in bytes and length in characters. A code overhaul could be in order.

RDBM's can differ in how they calculate string length (bytes or characters). For example if you extract data from Teradata you can get triple the length in bytes for character columns. So a Teradata VARCHAR(3) would yield a SAS $9. Loading that back into Teradata would give a VARCHAR(9) which can lead to a $27. Do that a couple of times and you can see the problem escalate.

So, the gist of my response is: don't underestimate the impact of changing your encoding. It is at best a migration project and not a mere configuration change. That migration may also be your best approach, depending on your reliance on data from UTF encoded data.

Hope this helps,

-- Jan.

SASKiwi · Posted 11-26-2020 02:01 PM

One option worth considering would be to create a new SASApp server instance configured for UTF-8 so comprehensive testing can be done. As I already mentioned there are other NLS options that may help.

Administration and Deployment

Encoding trouble UTF-8 vs LATIN1 with SAS ACCESS TO ODBC Configuration for Snowflake

Re: Encoding trouble UTF-8 vs LATIN1 with SAS ACCESS TO ODBC Configuration for Snowflake

Re: Encoding trouble UTF-8 vs LATIN1 with SAS ACCESS TO ODBC Configuration for Snowflake

Re: Encoding trouble UTF-8 vs LATIN1 with SAS ACCESS TO ODBC Configuration for Snowflake

Re: Encoding trouble UTF-8 vs LATIN1 with SAS ACCESS TO ODBC Configuration for Snowflake

Re: Encoding trouble UTF-8 vs LATIN1 with SAS ACCESS TO ODBC Configuration for Snowflake

Re: Encoding trouble UTF-8 vs LATIN1 with SAS ACCESS TO ODBC Configuration for Snowflake

Re: Encoding trouble UTF-8 vs LATIN1 with SAS ACCESS TO ODBC Configuration for Snowflake

Re: Encoding trouble UTF-8 vs LATIN1 with SAS ACCESS TO ODBC Configuration for Snowflake

Follow Us

What is...