Hi,
We are migrating from SAS 9.4 on Windows to Linux operating system.
For one of the DI jobs we see the below difference in the output of the job. This data is being pulled from a SQL server which is on Windows. We are using SAS/Access to ODBC to access data in sql server.
On Windows,
2/13/18
Sarah’s been running around like crazy; discussed the billboard next to Exit 90A, feature in the Sport & Leisure Research Group, and article in Forbes; she’s been busy putting simulators in the Mercedes Benz Stadium due to Arthur Blank’s dem
On Unix,
This is as a result of substring of 250 character of the record. I think the garbage characters are because of encoding mismatch.
Is there any configuration settings that I am missing. Major concern here is data loss because of encoding mismatch.
Run PROC OPTIONS on both servers and compare encoding options. Are they the same or different?
Its WLATIN1 for Windows and LATIN1 for Redhat.
On Redhat we changed the encoding to UTF-8 but that also did not help much.
It would be worth asking the SQL Server DBA what the encoding is for the database. Is it running on Windows? It would be worth opening a SAS Tech support track as well.
Yes. Its running on windows.
Latin1_General_CI_AI is the collation on the table from which we are reading the data.
Like @Kurt_Bremser observed that you end-up with multiple garbled bytes clearly indicates that somewhere along the line a conversion to UTF-8 must be happening (could just be for a WORK tables).
UTF-8 encoding for a right single quotation mark uses 3 bytes: 0xE2 0x80 0x99 (e28099)
Are the characters already garbled in the very first SAS table directly extracted from the DB or do they only get garbled when inserted to the target table?
Do you fully recreate the target table every single time or have you migrated it from your Windows environment? If migrated - did you use Proc Migrate or did you just copy the physical files? What's the encoding of your target table with the garbled characters?
Does the SAS log tell you anything about encoding changes/problems?
@Kurt_Bremser @Patrick The SQL server table is converted into a csv file and saved on the Redhat (compute) server. I think this is where it gets converted to UTF-8. echo $LANG on linux box gives en_US.UTF-8.
The target table is in snowflake. But i think the characters are garbled when they are written in the csv file on the unix server.
We are using file statement to write the csv on linux server. Below is the file statement.
file "/sasdata/Snowflake_temp/&sas_libname./&temp_table..csv" delimiter=',' DSD DROPOVER lrecl=32767 ENCODING='wlatin1';
We have tried both wlatin1 and latin1. But we still see the garbled character in the csv.
When I run file -bi on the csv file I get output as
text/plain; charset=unknown-8bit. which makes me think that the ENCODING option in file statement is not picked up.
Have you tried to simply create a WORK dataset from the SQL table, and look at its contents?
This is how the data looks.
I tried to use OUTENCODING option in the libname statement but it seems it is not supported in SAS/Access to ODBC.
26 LIBNAME srccrm_1 ODBC READ_LOCK_TYPE=NOLOCK READBUFF=1000 DATAsrc=LEGENDS SCHEMA=dbo USER=sassql
26 ! PASSWORD=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX OUTENCODING ='WLATIN1';
_________
278
NOTE: Libref SRCCRM_1 was successfully assigned as follows:
Engine: ODBC
Physical Name: LEGENDS
WARNING 278-63: The option OUTENCODING is not implemented in the ODBC engine.
Pull the data out of Snowflake with some other tool and check if that character is a normal ASCII single quote or one of those typography slanted quotes.
@Tom the character is
146 | 92 | 2019 | E2 | 80 | 99 | ’ | ’ | ’ | Right Single Quotation Mark |
https://www.i18nqa.com/debug/table-iso8859-1-vs-windows-1252.html
@Kurt_Bremser @Patrick Is there any other way I can set encoding in ODBC to write data as WLATIN1.
Is this available in SAS/Access to SQL Server?
Did you change the database that you are querying in addition to changing your SAS server? Somehow the 3 byte UTF-8 sequence for cute curly apostrophe got stuff into your database in place of the simple ASCII single quote. You should check how that table was created in the database.
Anyway it looks like Snowflake only supports Unicode (aka UTF-8).
You might try transcoding the string after copying from the database. But since it is a single byte encoding WLATIN1/LATIN1 will NOT be able to represent every possible Unicode character that might appear in your database.
Try the UNICODE() function.
@Tom We are reading from SQL server using SAS/Access to ODBC. We extract that data into a temporary work table on linux server and I think the data fetched in this dataset got the data converted from WLATIN1 to UTF-8. We use this temporary work dataset to create a csv and move that to s3 bucket and load it into snowflake.
We are looking for an option to maintain the same encoding on the temporary dataset from SQL server.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.