Hello,
We have only one SAS 9.4 M7 environment ( Lev1) where we have configured to connect to SQL Server database. SQL Server database is getting migrated to Snowflake and we have configured the same enviornment to use both SQL and snowflake until full migration gets completed.
SQL Server encoding standard is Latin-1 & but snowflake is in UTF-8. We have configured encoding as below in sasv9_usermods.cfg file as SQL server is still in live
-ENCODING WLATIN1
Now we are performing the comparison testing between SQL server and snowflake data and We are seeing lot of data difference because of this encoding differnce.We will not able to change this encoding to UTF-8 until the migration gets fully completed. As we have only one environment, I dont know how to overcome this issue.
Are there any efficient way to solve the problem? Please let me know.
Currently we are performing the conversion using the sas code as below in all our SAS program. This is causing us more time to run as well as we have to remove the lines of code after the migration is completed ( i.e. will define UTF-8 in sasv9_usermods.cfg after SQL server decommissioned )
data new_dataset;
set old_dataset(encoding='utf-8');
run;
Thanks
I wonder why you are trying to compare data in SAS!?
You Data Migration team is responsible for ensuring the integrity of the data contents are being preserved between the two underlying systems (SQL Server vs. Snowflake).
SAS software is a consuming Third-party application, no different than any other software that can consume data via ODBC driver provided by SAS Institute or some other vendor.
One other factor to keep in mind - https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/nlsref/p00fhxufzmc274n19xax2hllvin4.htm
"If your data set encoding does not match the encoding of your SAS session, the character data in your data set might need more space in the new encoding. For example, if your SAS session is using a UTF-8 session encoding, and you are reading a data set with an encoding of Windows cp 1252 (WLATIN1), some of the characters that require one byte in WLATIN1 might require 2 or 3 bytes in UTF-8. To avoid data truncation, your character variables must expand to a width that is 1.5 times greater than the width of the original data."
It's very important to understand the difference in how SAS represents character lengths vs. how a Database does.
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/acreldb/p02d1rki3en24vn105nmjsa6k786.htm
- Typically Snowflake uses character length, where SAS would use Byte length to properly store and present character values.
i.e. if the length of a column in Snowflake table is defined as varchar(3). SAS might need to define it as char(9) in order to ensure Multi-Byte characters are properly preserved.
These differences in Column length definitions and how data is extracted and stored in SAS data sets will cause lots of inconsistencies in SAS Table definitions/structure, as well as actual values stored within the columns (trailing spaces and padding)
With all that said, if you must use SAS software to compare, then I would recommend
Sorry for the lengthy reply, but I wanted to provide a context that may not be familiar to you or other readers. Hope this helps
Check these papers for additional info
https://support.sas.com/resources/papers/Multilingual_Computing_with_SAS_94.pdf
https://support.sas.com/resources/papers/proceedings20/4103-2020.pdf
Plus, I would highly recommend you ask your SAS System Administrator to download and install the latest Hot fixes for your SAS software.
Example: The SAS/ACCESS Interface to Snowflake has had several hot fixes https://tshf.sas.com/techsup/download/hotfix/HF2/L9B_lax.html
Thanks for the suggestion. What I did now , I changed entirely the SAS server encoding to utf-8 as per below
1. Changed encoding to utf-8 in sasv9_usermods.cfg under /sas/config/Lev1/SASApp
-ENCODING UTF-8
2. Replaced the softlink to utf-8 folder in sas/sashome/SASFoundation/9.4
sas -> bin/sas_u8
previoudly it was sas-> bin/sas_en
then I restarted the object spawner and ran below command
proc options option=encoding;run;
I got ouput as encoding=UTF-8
BUt the problem is now with snowflake query from SAS. Below is my SAS code
proc sql;
connect using snow as snow;
create table work.test as
select * from connection to snow( select current_time);
disconnect from snow;
run;
Im getting error as
ERROR: CLI prepare error: SQL compliation error:Syntax error line 1 at postion 0 unexpected '��'.
SQL statement : select current_time
I repointed all the configuration changes back to WLATIN1 , but the same code worked without any failure. I'm confused now.. Why the snowflake code is not query with UTF-8 encoding in SAS session? Please let me know. Thanks
Beside changing the SAS session encoding and the binary, Are you sure your SAS session is using the correct Snowflake ODBC Driver?
Note: SAS 9.4 comes with it's own ODBC Driver (Web Infrastructure Platform -- which is a Postgres Database)
Check out this SAS Communities for an example on how you can use two (2) different ODBC drivers within your SAS session
@AhmedAl_Attar Thanks for your response. Yes, I have installed a correct ODBC driver which is downloaded from snowflake . We are in SAS 9.4M7 where SAS states that we need to download on our own and only from M8, SAS provides own ODBC driver for snowflake.
I have not configured export SIMBAINI file in my sasenv_local file and Im not sure whether that is causing the problem during encoding conversion.
I have raised a track with SAS and let you know if i get a solution on this.
I have edited the simba.snowflake.ini file now with the below variable and then it worked for me
DriverManagerEncoding=UTF-16
The SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment.
SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.
Find more tutorials on the SAS Users YouTube channel.