BookmarkSubscribeRSS Feed
psgard
Fluorite | Level 6

Hello,

 

I work in SAS GRID 9.4 on a Redhat 7 Linux platform.

I want to access to Snowflake Data with a SAS ACCESS TO ODBC connector.

Snowflake Encoding : UTF-8

Linux encoding (echo $LANG) : en_US.UTF-8

SAS Platform encoding (proc options group=languagecontrol; run;) :  ENCODING=LATIN1 / LOCALE=EN_US 

SAS ACCESS TO ODBC doesn't manage option encoding.

When I read a database with special UTF-8 characters (é,è,à….), i receive wrong data,

for example:"Contrat à durée indéterminée​"

I want to receive "Contrat à durée indéterminée".

How do i configure my ODBC to obtain this result for all users on my SAS platform, please ?

 

Thanks,

Best regards.

 

 

 

 

8 REPLIES 8
MargaretC
SAS Employee

When using the SAS/ACCESS to ODBC product, you use an ODBC driver from Snowflake.  If the data is not coming back correctly via the ODBC driver, you may want to reach out to Snowflake to make sure you have the correct driver.

 

Margaret 

psgard
Fluorite | Level 6

Hello MargaretC,

 

I have already checked this part with isql directly on the Linux server.

I think that the trouble is "between" SAS and the driver Linux Snowflake.

 

result of the isql query:

 

SQL>  select CLBR_CTRA_TYPE_LB from BPFR_MEDIUM.BPFR_MEDIUM_VIEW.DIM_CLBR where clbr_sk = 11036 and tec_current_in = 1;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| CLBR_CTRA_TYPE_LB                                                                                                                                                                                       |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Contrat à durée indéterminée                                                                                                                                                                        |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
SQLRowCount returns 1
1 rows fetched

 

 

 

 

Tom
Super User Tom
Super User

If your SAS session is using LATIN1 and you are trying to read UTF-8 characters you are really stuck.  LATIN1 is a single byte encoding. So there are only 256 possible codes for characters.  UTF-8 can use up to 4 bytes for a character.  So unless your input data only uses the original 7 bit ASCII codes there is very good probability that you will encounter some UTF-8 character than cannot be represented with LATIN1 encoding.

 

Change how you start SAS so that it is using UTF-8 encoding and re-test your connection.

SASKiwi
PROC Star

I'm pretty sure you will need to change your SAS session encoding to match the database and that has to be done at startup usually in a SAS config. Depending on your SAS client interface, further locale / encoding tweaking might be required. Enterprise Guide allows you to align to the server locale which might be helpful. 

psgard
Fluorite | Level 6

Hello,

 

All SAS Databases in our Platform are in LATIN1, if i modify the SAS session encoding in the file: /sas/software/SASFoundation/9.4/bin/sasenv_local

won't there be another problems reading my existing SAS databases or joining/merging sas database and snowflake database  (in proc sql for example) ?

SASKiwi
PROC Star

AFAIK, UTF-8 encoding includes LATIN languages. Why don't you test it out temporarily? 

 

There are other NLS settings that may help: refer to the NLS Guide 

jklaverstijn
Rhodochrosite | Level 12

Don't think that switching from LATIN to UTF-8 is a matter of simply changing the encoding of your SAS session!

 

Your existing datasets would still be LATIN. This is almost as if your data is from a different operating system. You would run into all sorts of limitations and issues.Some examples:

 

Copying a table that has the name "André" in a CHAR(5) would no longer fit in the UTF-8 world, leading to truncation and error messages (real world example).

 

The function call length("André)" would give different results between LATIN and UTF-8. Therefore the KLENGTH() function (and more generally, a whole slew of K*() functions) are available.to deal with the now new discrepancy between length in bytes and length in characters. A code overhaul could be in order.

 

RDBM's can differ in how they calculate string length (bytes or characters). For example if you extract data from Teradata you can get triple the length in bytes for character columns. So a Teradata VARCHAR(3) would yield a SAS $9. Loading that back into Teradata would give a VARCHAR(9) which can lead to a $27. Do that a couple of times and you can see the problem escalate.

 

So, the gist of my response is: don't underestimate the impact of changing your encoding. It is at best a migration project and not a mere configuration change. That migration may also be your best approach, depending on your reliance on data from UTF encoded data.

 

Hope this helps,

-- Jan.

SASKiwi
PROC Star

One option worth considering would be to create a new SASApp server instance configured for UTF-8 so comprehensive testing can be done. As I already mentioned there are other NLS options that may help.

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

Get Started with SAS Information Catalog in SAS Viya

SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 3509 views
  • 3 likes
  • 5 in conversation