BookmarkSubscribeRSS Feed
sid3284
Calcite | Level 5

Hi,

 

We are migrating from SAS 9.4 on Windows to Linux operating system.

 

For one of the DI jobs we see the below difference in the output of the job. This data is being pulled from a SQL server which is on Windows. We are using SAS/Access to ODBC to access data in sql server.

 

On Windows,

2/13/18
Sarah’s been running around like crazy; discussed the billboard next to Exit 90A, feature in the Sport & Leisure Research Group, and article in Forbes; she’s been busy putting simulators in the Mercedes Benz Stadium due to Arthur Blank’s dem

 

On Unix,

2/13/18
Sarah�s been running around like crazy; discussed the billboard next to Exit 90A, feature in the Sport & Leisure Research Group, and article in Forbes; she�s been busy putting simulators in the Mercedes Benz Stadium due to Arthur B

 

This is as a result of substring of 250 character of the record. I think the garbage characters are because of encoding mismatch.

 

Is there any configuration settings that I am missing. Major concern here is data loss because of encoding mismatch.

 

19 REPLIES 19
Reeza
Super User
The config file specifies the encoding for each system. But...Unix files usually have a different encoding than Windows.
SASKiwi
PROC Star

Run PROC OPTIONS on both servers and compare encoding options. Are they the same or different?

sid3284
Calcite | Level 5

Its WLATIN1 for Windows and LATIN1 for Redhat.

 

On Redhat we changed the encoding to UTF-8 but that also did not help much.

 

SASKiwi
PROC Star

It would be worth asking the SQL Server DBA what the encoding is for the database. Is it running on Windows? It would be worth opening a SAS Tech support track as well.

sid3284
Calcite | Level 5

Yes. Its running on windows.

 

Latin1_General_CI_AI is the collation on the table from which we are reading the data.

Patrick
Opal | Level 21

Like @Kurt_Bremser observed that you end-up with multiple garbled bytes clearly indicates that somewhere along the line a conversion to UTF-8 must be happening (could just be for a WORK tables).

UTF-8 encoding for a right single quotation mark uses 3 bytes: 0xE2 0x80 0x99 (e28099)

 

Are the characters already garbled in the very first SAS table directly extracted from the DB or do they only get garbled when inserted to the target table?

Do you fully recreate the target table every single time or have you migrated it from your Windows environment? If migrated - did you use Proc Migrate or did you just copy the physical files? What's the encoding of your target table with the garbled characters?

Does the SAS log tell you anything about encoding changes/problems?

sid3284
Calcite | Level 5

@Kurt_Bremser @Patrick The SQL server table is converted into a csv file and saved on the Redhat (compute) server. I think this is where it gets converted to UTF-8. echo $LANG on linux box gives en_US.UTF-8.

The target table is in snowflake. But i think the characters are garbled when they are written in the csv file on the unix server.

 

We are using file statement to write the csv on linux server. Below is the file statement.

 

file "/sasdata/Snowflake_temp/&sas_libname./&temp_table..csv" delimiter=',' DSD DROPOVER lrecl=32767 ENCODING='wlatin1';

 

We have tried both wlatin1 and latin1. But we still see the garbled character in the csv.

When I run file -bi on the csv file I get output as

text/plain; charset=unknown-8bit. which makes me think that the ENCODING option in file statement is not picked up.

 

 

 

 

 

sid3284
Calcite | Level 5
2/13/18
Sarahâ??s been running around like crazy; discussed the billboard next to Exit 90A, feature in the Sport & Leisure Research Group, and article in Forbes; sheâ??s been busy putting simulators in the Mercedes Benz Stadium due to Arthur Blankâ??s 
Description

 

This is how the data looks.

 

I tried to use OUTENCODING option in the libname statement but it seems it is not supported in SAS/Access to ODBC.

 

26 LIBNAME srccrm_1 ODBC READ_LOCK_TYPE=NOLOCK READBUFF=1000 DATAsrc=LEGENDS SCHEMA=dbo USER=sassql
26 ! PASSWORD=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX OUTENCODING ='WLATIN1';
_________
278
NOTE: Libref SRCCRM_1 was successfully assigned as follows:
Engine: ODBC
Physical Name: LEGENDS
WARNING 278-63: The option OUTENCODING is not implemented in the ODBC engine.

 

Tom
Super User Tom
Super User

Pull the data out of Snowflake with some other tool and check if that character is a normal ASCII single quote or one of those typography slanted quotes.

sid3284
Calcite | Level 5

@Tom the character is

 

146922019E28099’’

Right Single Quotation Mark

 

https://www.i18nqa.com/debug/table-iso8859-1-vs-windows-1252.html 

 

 

 

sid3284
Calcite | Level 5

@Kurt_Bremser @Patrick Is there any other way I can set encoding in ODBC to write data as WLATIN1.

 

Is this available in SAS/Access to SQL Server?

Tom
Super User Tom
Super User

Did you change the database that you are querying in addition to changing your SAS server? Somehow the 3 byte UTF-8 sequence for cute curly apostrophe got stuff into your database in place of the simple ASCII single quote. You should check how that table was created in the database.

 

Anyway it looks like Snowflake only supports Unicode (aka UTF-8). 

 

You might try transcoding the string after copying from the database.  But since it is a single byte encoding WLATIN1/LATIN1 will NOT be able to represent every possible Unicode character that might appear in your database.

 

Try the UNICODE() function.

https://documentation.sas.com/?docsetId=nlsref&docsetTarget=p1s61kmqg61m29n109i6dz4ivkbw.htm&docsetV...

sid3284
Calcite | Level 5

@Tom We are reading from SQL server using SAS/Access to ODBC. We extract that data into a temporary work table on linux server and I think the data fetched in this dataset got the data converted from WLATIN1 to UTF-8. We use this temporary work dataset to create a csv and move that to s3 bucket and load it into snowflake. 

 

We are looking for an option to maintain the same encoding on the temporary dataset from SQL server. 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 19 replies
  • 3484 views
  • 1 like
  • 6 in conversation