BookmarkSubscribeRSS Feed
Jade_SAS
Pyrite | Level 9

Hi All,

 

    I am using PROC SQL to read a table from Oracle server, and I find the special characters in one column is outputting different from what I see from Oracle server.

   I wonder whether anyone else has this issue and how did you work through it. Any advice is welcome. Thank you!

   For example:

   The character string was "John's grade" in Oracle server, what I see from SAS output is "John?s Grade".

   The coding I am using is listed below:

  

PROC SQL;

CONNECT TO oracle(user=*** password=*** path=***);

CREATE TABLE TEST AS

SELECT * FROM CONNECTION TO ORACLE

(SELECT distinct *

FROM Cla.table1 C);

DISCONNECT FROM ORACLE;

QUIT;

 

 

Jade

10 REPLIES 10
Jade_SAS
Pyrite | Level 9

Thank you.

 

I got this from SAS EG:

ENCODING=UTF-8 Specifies the default character-set encoding for the SAS session.

 

Also I got these from the Oracle server:

PARAMETER                                VALUE

NLS_CHARACTERSET                WE8MSWIN1252

 

What I should do next? Thanks again for your quick reply!

 

Jade

Jade_SAS
Pyrite | Level 9

Anyone has ideas the next step should be?

 

Should I contact the Oracle and SAS admin to make to Encoding in these two systems consistent (both should be UTF-8) or do we have a work around for this situation? Thank you!

Miracle
Barite | Level 11

Hi Jade. I think you have to contact your SAS admin or Oracle database admin as far as I have read below. Good luck.

Encoding: helping SAS speak your language

Special char from Oracle in DI Studio

Sven111
Pyrite | Level 9

I've had some similar problems.  In my case it was really only one or two symbols that were problematic and wasn't dealing with a ton of data so I could just use a REPLACE or REGEXP_REPLACE function inside the explicit pass-through SQL portion so Oracle would automatically translate the offending characters into something readable for SAS.  This may not be a scalable solution however.

Jade_SAS
Pyrite | Level 9

Thank you Steve.

 

The data I am using is really big and I will wait and see if I have other options.

 

Actually I am using SQL explicit PASS Through to get two tables from Oracle and Teradata server, then use SAS PROC compare to validate whether the two tables contains the same value or not.

 

I investigate and found that the SAS I am using has the encoding UTF-8, while the Oracle server has the

WE8MSWIN1252, and the Teradata server has UTF-8.

One thing I want to mention that, the special character in Teradata first looks different from the Oracle server, after I change the charset from UTF8 to cp1252 at the Teradata studio, then the special characters looks the same as the oracles ones.I wonder what we can do in SAS to solve this issue.

 

 For people who work with Oracle, Teradata and SAS, what's the best practice for the encoding? Should we have the admins of Oracle, Teradata and SAS to make the default encoding to UTF-8 and be consistent?   Any suggestions on this issue is welcome!

 

Jade

 

Jade_SAS
Pyrite | Level 9
One more thing to add, for Teradata , at the server level, the charset is UTF-8, but when developer created the table (which I pass through to SAS), the charset is LATIN for the columns I am comparing with Oracle.
Sven111
Pyrite | Level 9

Character encodings get to be a pain real quick, if you can get your Oracle instance to be UTF-8 without too much difficulty it'd be worth it my opinion, but good luck convincing your DBA's to do it.

 

If you're just trying to see if certain values match, you could try having both Teradata and Oracle hash the values internally via MD5 or SHA and then compare the message digests.  I'm actually not sure if this would still work if everything is stored using different encodings, but it may be worth looking into.  The outputs should at least be directly comparable and shouldn't have as many issues with encodings since they'd be in hexadecimal. 

Jade_SAS
Pyrite | Level 9

Thank you, I will try hash the values first

Tom
Super User Tom
Super User

Looks like you are a victim of Microsoft's "stupid" quotes. 

I typed John's grade into a MS Office app and then also pasted in the text from your message and Office decided that I didn't really want the simple ASCII character for a single quote character and replaced it with some other character that it interprets as looking something like what you might see in a type setting program instead in a computer program.

image.png

So the normal single quote is ASCII code '27'x. In the WLATIN1 (or codepage 1252) the "stupid" quote is a "right single quotation mark" and is coded as '92'x.  IN UTF-8 is should appear as the multi-byte string 'E28099'x.

 

Not sure why SAS/Oracle are not automatically translating the character for you.  Hopefully one the links that others have provided will show how to get it to work.

 

If the only non 7bit ASCII codes your data has are these stupid quotes then I would recommend changing them all to normal single ('27'x) or double ('22'x) quotes.  However if your text fields also include more complex characters, like accented letters etc, then you need to get the encoding right.

 

You could try running SAS with the same LATIN1 (wlatin1, 1252) encoding and see if the characters come over from Oracle correctly.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 4890 views
  • 3 likes
  • 4 in conversation