SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

How SAS output the special character in string (from Oracle server) correctly?

Reply
Regular Contributor
Posts: 223

How SAS output the special character in string (from Oracle server) correctly?

[ Edited ]

Hi All,

 

    I am using PROC SQL to read a table from Oracle server, and I find the special characters in one column is outputting different from what I see from Oracle server.

   I wonder whether anyone else has this issue and how did you work through it. Any advice is welcome. Thank you!

   For example:

   The character string was "John's grade" in Oracle server, what I see from SAS output is "John?s Grade".

   The coding I am using is listed below:

  

PROC SQL;

CONNECT TO oracle(user=*** password=*** path=***);

CREATE TABLE TEST AS

SELECT * FROM CONNECTION TO ORACLE

(SELECT distinct *

FROM Cla.table1 C);

DISCONNECT FROM ORACLE;

QUIT;

 

 

Jade

Regular Contributor
Posts: 249

Re: How SAS output the special charater in string (from Oracle server) correctly?

Regular Contributor
Posts: 223

Re: How SAS output the special charater in string (from Oracle server) correctly?

Thank you.

 

I got this from SAS EG:

ENCODING=UTF-8 Specifies the default character-set encoding for the SAS session.

 

Also I got these from the Oracle server:

PARAMETER                                VALUE

NLS_CHARACTERSET                WE8MSWIN1252

 

What I should do next? Thanks again for your quick reply!

 

Jade

Regular Contributor
Posts: 223

Re: How SAS output the special charater in string (from Oracle server) correctly?

Anyone has ideas the next step should be?

 

Should I contact the Oracle and SAS admin to make to Encoding in these two systems consistent (both should be UTF-8) or do we have a work around for this situation? Thank you!

Regular Contributor
Posts: 249

Re: How SAS output the special charater in string (from Oracle server) correctly?

Hi Jade. I think you have to contact your SAS admin or Oracle database admin as far as I have read below. Good luck.

Encoding: helping SAS speak your language

Special char from Oracle in DI Studio

Frequent Contributor
Posts: 99

Re: How SAS output the special charater in string (from Oracle server) correctly?

I've had some similar problems.  In my case it was really only one or two symbols that were problematic and wasn't dealing with a ton of data so I could just use a REPLACE or REGEXP_REPLACE function inside the explicit pass-through SQL portion so Oracle would automatically translate the offending characters into something readable for SAS.  This may not be a scalable solution however.

Regular Contributor
Posts: 223

Re: How SAS output the special charater in string (from Oracle server) correctly?

Thank you Steve.

 

The data I am using is really big and I will wait and see if I have other options.

 

Actually I am using SQL explicit PASS Through to get two tables from Oracle and Teradata server, then use SAS PROC compare to validate whether the two tables contains the same value or not.

 

I investigate and found that the SAS I am using has the encoding UTF-8, while the Oracle server has the

WE8MSWIN1252, and the Teradata server has UTF-8.

One thing I want to mention that, the special character in Teradata first looks different from the Oracle server, after I change the charset from UTF8 to cp1252 at the Teradata studio, then the special characters looks the same as the oracles ones.I wonder what we can do in SAS to solve this issue.

 

 For people who work with Oracle, Teradata and SAS, what's the best practice for the encoding? Should we have the admins of Oracle, Teradata and SAS to make the default encoding to UTF-8 and be consistent?   Any suggestions on this issue is welcome!

 

Jade

 

Regular Contributor
Posts: 223

Re: How SAS output the special charater in string (from Oracle server) correctly?

One more thing to add, for Teradata , at the server level, the charset is UTF-8, but when developer created the table (which I pass through to SAS), the charset is LATIN for the columns I am comparing with Oracle.
Frequent Contributor
Posts: 99

Re: How SAS output the special charater in string (from Oracle server) correctly?

Character encodings get to be a pain real quick, if you can get your Oracle instance to be UTF-8 without too much difficulty it'd be worth it my opinion, but good luck convincing your DBA's to do it.

 

If you're just trying to see if certain values match, you could try having both Teradata and Oracle hash the values internally via MD5 or SHA and then compare the message digests.  I'm actually not sure if this would still work if everything is stored using different encodings, but it may be worth looking into.  The outputs should at least be directly comparable and shouldn't have as many issues with encodings since they'd be in hexadecimal. 

Regular Contributor
Posts: 223

Re: How SAS output the special charater in string (from Oracle server) correctly?

Thank you, I will try hash the values first

Super User
Super User
Posts: 7,941

Re: How SAS output the special character in string (from Oracle server) correctly?

Looks like you are a victim of Microsoft's "stupid" quotes. 

I typed John's grade into a MS Office app and then also pasted in the text from your message and Office decided that I didn't really want the simple ASCII character for a single quote character and replaced it with some other character that it interprets as looking something like what you might see in a type setting program instead in a computer program.

image.png

So the normal single quote is ASCII code '27'x. In the WLATIN1 (or codepage 1252) the "stupid" quote is a "right single quotation mark" and is coded as '92'x.  IN UTF-8 is should appear as the multi-byte string 'E28099'x.

 

Not sure why SAS/Oracle are not automatically translating the character for you.  Hopefully one the links that others have provided will show how to get it to work.

 

If the only non 7bit ASCII codes your data has are these stupid quotes then I would recommend changing them all to normal single ('27'x) or double ('22'x) quotes.  However if your text fields also include more complex characters, like accented letters etc, then you need to get the encoding right.

 

You could try running SAS with the same LATIN1 (wlatin1, 1252) encoding and see if the characters come over from Oracle correctly.

Ask a Question
Discussion stats
  • 10 replies
  • 254 views
  • 3 likes
  • 4 in conversation