Re: SAS Encoding conversion

freshstarter · Posted 03-13-2025 01:46 PM

Hello,

We have only one SAS 9.4 M7 environment ( Lev1) where we have configured to connect to SQL Server database. SQL Server database is getting migrated to Snowflake and we have configured the same enviornment to use both SQL and snowflake until full migration gets completed.
SQL Server encoding standard is Latin-1 & but snowflake is in UTF-8. We have configured encoding as below in sasv9_usermods.cfg file as SQL server is still in live

-ENCODING WLATIN1

Now we are performing the comparison testing between SQL server and snowflake data and We are seeing lot of data difference because of this encoding differnce.We will not able to change this encoding to UTF-8 until the migration gets fully completed. As we have only one environment, I dont know how to overcome this issue.

Are there any efficient way to solve the problem? Please let me know.

Currently we are performing the conversion using the sas code as below in all our SAS program. This is causing us more time to run as well as we have to remove the lines of code after the migration is completed ( i.e. will define UTF-8 in sasv9_usermods.cfg after SQL server decommissioned )

data new_dataset;
set old_dataset(encoding='utf-8');
run;

Thanks

JuanS_OCS · Posted 03-13-2025 02:41 PM

Hi there,

You have 2 options (more, but I’ll simplify):

The code approach and the configuration approach.

In the code approach, and since your SAS session run on wlatin, you would need to use the transcoding facilities SAS provides, quite intensive, for all the activities related to snowflake/Unicode/UTF8.

On the configuration approach you could have an extra SASApp, but running SAS on UTF8, for those operations. As side option, you could just call SAS executable, but passing a sas9.cfg file defining UTF8 encoding, just for that migration.

Ksharp · Posted 03-13-2025 10:04 PM

Do you settle the environment variable encoding of DB for SQLServer and snowflake connection in the "sasenv_local.cfg“ ？

freshstarter · Posted 03-14-2025 03:47 AM

Hi @Ksharp,

No we didnot define anything in sasenv_local for encoding which is under sashome/SASFoundation/9.4/bin.

But we defined both Sql server and snowflake odbc path under LD_LIBRARY_PATH.

How to define for encoding in this file ? Please let me know.

AhmedAl_Attar · Posted 03-14-2025 08:03 AM

@freshstarter

I wonder why you are trying to compare data in SAS!?

You Data Migration team is responsible for ensuring the integrity of the data contents are being preserved between the two underlying systems (SQL Server vs. Snowflake).

SAS software is a consuming Third-party application, no different than any other software that can consume data via ODBC driver provided by SAS Institute or some other vendor.

One other factor to keep in mind - https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/nlsref/p00fhxufzmc274n19xax2hllvin4.htm

"If your data set encoding does not match the encoding of your SAS session, the character data in your data set might need more space in the new encoding. For example, if your SAS session is using a UTF-8 session encoding, and you are reading a data set with an encoding of Windows cp 1252 (WLATIN1), some of the characters that require one byte in WLATIN1 might require 2 or 3 bytes in UTF-8. To avoid data truncation, your character variables must expand to a width that is 1.5 times greater than the width of the original data."

It's very important to understand the difference in how SAS represents character lengths vs. how a Database does.

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/acreldb/p02d1rki3en24vn105nmjsa6k786.htm

- Typically Snowflake uses character length, where SAS would use Byte length to properly store and present character values.

i.e. if the length of a column in Snowflake table is defined as varchar(3). SAS might need to define it as char(9) in order to ensure Multi-Byte characters are properly preserved.

These differences in Column length definitions and how data is extracted and stored in SAS data sets will cause lots of inconsistencies in SAS Table definitions/structure, as well as actual values stored within the columns (trailing spaces and padding)

With all that said, if you must use SAS software to compare, then I would recommend

Use SAS UTF-8 Binaries <SASHome>/SASFoundation/9.4/bin/sas_u8
Setup your Snowflake & SQLServer Libname statements with the correct NLS related : LIBNAME and Other Options for NLS
IF you are extracting data from the database and storing as SAS data sets for comparison, then store the extracts in different paths/directories
1. Use CVP engine for your Latin1 extracts (SQLServer) https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lestmtsglobal/p1ml1gmtqm3h72n17yjxbyagw0mm...
2. Ignore column lengths differences and focus on comparing Columns Types and Stripped character values.

Sorry for the lengthy reply, but I wanted to provide a context that may not be familiar to you or other readers. Hope this helps

AhmedAl_Attar · Posted 03-14-2025 08:12 AM

Check these papers for additional info
https://support.sas.com/resources/papers/Multilingual_Computing_with_SAS_94.pdf

https://support.sas.com/resources/papers/proceedings20/4103-2020.pdf

Plus, I would highly recommend you ask your SAS System Administrator to download and install the latest Hot fixes for your SAS software.
Example: The SAS/ACCESS Interface to Snowflake has had several hot fixes https://tshf.sas.com/techsup/download/hotfix/HF2/L9B_lax.html

freshstarter · Posted 03-17-2025 11:28 AM

Thanks for the suggestion. What I did now , I changed entirely the SAS server encoding to utf-8 as per below

1. Changed encoding to utf-8 in sasv9_usermods.cfg under /sas/config/Lev1/SASApp

-ENCODING UTF-8

2. Replaced the softlink to utf-8 folder in sas/sashome/SASFoundation/9.4

sas -> bin/sas_u8

previoudly it was sas-> bin/sas_en

then I restarted the object spawner and ran below command

proc options option=encoding;run;

I got ouput as encoding=UTF-8

BUt the problem is now with snowflake query from SAS. Below is my SAS code

proc sql;

connect using snow as snow;

create table work.test as

select * from connection to snow( select current_time);

disconnect from snow;

run;

Im getting error as

ERROR: CLI prepare error: SQL compliation error:Syntax error line 1 at postion 0 unexpected '��'.

SQL statement : select current_time

I repointed all the configuration changes back to WLATIN1 , but the same code worked without any failure. I'm confused now.. Why the snowflake code is not query with UTF-8 encoding in SAS session? Please let me know. Thanks

AhmedAl_Attar · Posted 03-17-2025 02:14 PM

hmm,
at this point, I would highly recommend working with SAS Tech Support to resolve this issue!

AhmedAl_Attar · Posted 03-17-2025 03:27 PM

@freshstarter

Beside changing the SAS session encoding and the binary, Are you sure your SAS session is using the correct Snowflake ODBC Driver?

Note: SAS 9.4 comes with it's own ODBC Driver (Web Infrastructure Platform -- which is a Postgres Database)

Check out this SAS Communities for an example on how you can use two (2) different ODBC drivers within your SAS session

Configure sasenv_local for two different ODBC drivers

freshstarter · Posted 03-17-2025 04:49 PM

@AhmedAl_Attar Thanks for your response. Yes, I have installed a correct ODBC driver which is downloaded from snowflake . We are in SAS 9.4M7 where SAS states that we need to download on our own and only from M8, SAS provides own ODBC driver for snowflake.

I have not configured export SIMBAINI file in my sasenv_local file and Im not sure whether that is causing the problem during encoding conversion.

I have raised a track with SAS and let you know if i get a solution on this.

freshstarter · Posted 03-19-2025 12:46 PM

I have edited the simba.snowflake.ini file now with the below variable and then it worked for me

DriverManagerEncoding=UTF-16