04-18-2018 11:00 AM - last edited on 04-19-2018 04:46 PM by ChrisHemedinger
I am using PROC SQL to read a table from Oracle server, and I find the special characters in one column is outputting different from what I see from Oracle server.
I wonder whether anyone else has this issue and how did you work through it. Any advice is welcome. Thank you!
The character string was "John's grade" in Oracle server, what I see from SAS output is "John?s Grade".
The coding I am using is listed below:
CONNECT TO oracle(user=*** password=*** path=***);
CREATE TABLE TEST AS
SELECT * FROM CONNECTION TO ORACLE
(SELECT distinct *
FROM Cla.table1 C);
DISCONNECT FROM ORACLE;
04-18-2018 11:16 AM
04-18-2018 11:40 AM
I got this from SAS EG:
ENCODING=UTF-8 Specifies the default character-set encoding for the SAS session.
Also I got these from the Oracle server:
What I should do next? Thanks again for your quick reply!
04-18-2018 03:09 PM
Anyone has ideas the next step should be?
Should I contact the Oracle and SAS admin to make to Encoding in these two systems consistent (both should be UTF-8) or do we have a work around for this situation? Thank you!
04-18-2018 10:56 PM
04-19-2018 04:30 PM
I've had some similar problems. In my case it was really only one or two symbols that were problematic and wasn't dealing with a ton of data so I could just use a REPLACE or REGEXP_REPLACE function inside the explicit pass-through SQL portion so Oracle would automatically translate the offending characters into something readable for SAS. This may not be a scalable solution however.
04-20-2018 04:19 PM
Thank you Steve.
The data I am using is really big and I will wait and see if I have other options.
Actually I am using SQL explicit PASS Through to get two tables from Oracle and Teradata server, then use SAS PROC compare to validate whether the two tables contains the same value or not.
I investigate and found that the SAS I am using has the encoding UTF-8, while the Oracle server has the
WE8MSWIN1252, and the Teradata server has UTF-8.
One thing I want to mention that, the special character in Teradata first looks different from the Oracle server, after I change the charset from UTF8 to cp1252 at the Teradata studio, then the special characters looks the same as the oracles ones.I wonder what we can do in SAS to solve this issue.
For people who work with Oracle, Teradata and SAS, what's the best practice for the encoding? Should we have the admins of Oracle, Teradata and SAS to make the default encoding to UTF-8 and be consistent? Any suggestions on this issue is welcome!
04-20-2018 04:23 PM
04-20-2018 04:34 PM
Character encodings get to be a pain real quick, if you can get your Oracle instance to be UTF-8 without too much difficulty it'd be worth it my opinion, but good luck convincing your DBA's to do it.
If you're just trying to see if certain values match, you could try having both Teradata and Oracle hash the values internally via MD5 or SHA and then compare the message digests. I'm actually not sure if this would still work if everything is stored using different encodings, but it may be worth looking into. The outputs should at least be directly comparable and shouldn't have as many issues with encodings since they'd be in hexadecimal.
04-23-2018 09:31 AM
Looks like you are a victim of Microsoft's "stupid" quotes.
I typed John's grade into a MS Office app and then also pasted in the text from your message and Office decided that I didn't really want the simple ASCII character for a single quote character and replaced it with some other character that it interprets as looking something like what you might see in a type setting program instead in a computer program.
So the normal single quote is ASCII code '27'x. In the WLATIN1 (or codepage 1252) the "stupid" quote is a "right single quotation mark" and is coded as '92'x. IN UTF-8 is should appear as the multi-byte string 'E28099'x.
Not sure why SAS/Oracle are not automatically translating the character for you. Hopefully one the links that others have provided will show how to get it to work.
If the only non 7bit ASCII codes your data has are these stupid quotes then I would recommend changing them all to normal single ('27'x) or double ('22'x) quotes. However if your text fields also include more complex characters, like accented letters etc, then you need to get the encoding right.
You could try running SAS with the same LATIN1 (wlatin1, 1252) encoding and see if the characters come over from Oracle correctly.