Dear all,
I'm trying to understand and solve a problem I have with SAS, desktop version, 9.4
I accessing a Postgres database (db encoding is UTF-8) with SAS with Proc SQL through the ODBC engine. I use SAS in Unicode and the datafiles that are being created are in UTF-8 unicode format.
One of the lines in the result set I get back show the white question mark in a black diamond shape instead of bullet points.
This is the code
libname sql_read odbc dsn=datasource schema=schemaAB;
PROC SQL;
CREATE TABLE work.test (encoding='UTF-8') as
SELECT id, result
FROM sql_read.tableA
where id = '123'
;
QUIT;
The log is clean, it doesn't see anything about truncation
For trouble shooting I checked other tools to see what the result is there:
Python, and MS Access using the same Odbc data source as SAS and show the bullet points.
So I think this is something SAS specific, but I don't know what.
I tried to use several encoding settings I found but nothing helps.
Does anyone have an idea what this might be?
The bullet point in this case might just be one example, I have other tables that I just didn't check yet.
Thanks!
I did try that, but I get the following error:
libname sql_read odbc dsn=datasource schema=schemaAB client_encoding='utf-8';
ERROR: Libref SQL_READ is not assigned.
ERROR: Error in the LIBNAME statement.
ERROR 22-7: Invalid option name CLIENT_ENCODING.
I assume this is because I use the ODBC engine, not the Postgres engine. My SAS installation does not include the Postgres engine.
What is your SAS session encoding? That will likely influence viewing the contents of a SAS dataset regardless of what encoding it is stored with.
proc options option=encoding; run;
shows:
SAS (r) Proprietary Software Release 9.4 TS1M5
ENCODING=UTF-8 Specifies the default character-set encoding for the SAS session.
I also tried to manually create a data file with the string the has the bullet points that I see in MS Access or Python when I extract the same data as with the SAS Proc SQL statement:
data test2;
infile datalines delimiter='|';
informat id $20. result $200. ;
input id result ;
datalines;
123 | • TEST TEST TEST
;
run;
This is what I see when I look at table test2:
Compared to the file the Proc SQL statement is creating:
So my SAS session can display the bullet point character, but when retrieving the data is doesn't get it, even though other programs that use the same ODBC data connection are displaying the bullet point.
Are you just talking about how it looks in the SAS Display Manager interface? I don't think that interface is very UTF-8 compatible. What does is look like you you write it to an RTF or HTML file using ODS?
Thank you for this suggestion!
I did an ODS export to PDF of the data I get from the Postgres database and now I do see this:
So as you thought, the Display Manager is not showing it correctly.
But, as a next step I did run a proc compare between the file I downloaded and the one I manually created by pasting the value in there that I got with MS Access, and I see this:
I would have expected now that the underlying value is the same. Is there anything I can do here?
Do you have Enterprise Guide or SAS Studio available? They are likely to be more UTF-8 aware than Display Manager.
I just ran the same code in Enterprise Guide and did a proc compare.
So now I'm just surprised to see that EG and normal SAS handle UTF-8 data that differently in some cases.
Thank you for the suggestion.
Looks to me like you pasted UTF-8 text into an editor that doesn't support UTF-8 code. I am not surprised that the result is different.
Hi Tom,
I copied text the UFT-8 character (the bullet point) into the SAS program code in SAS 9.4 to create a sample dataset that I compared with proc compare to the information that I download form the Postgres database. And I execute the same program one time in SAS 9.4 and after that in EG.
I use proc compare for both data sets:
SAS 9.4 displays the bullet point when I output the information with ODS to PDF but not in the Data Viewer or in Proc Compare.
EG shows the bullet point everywhere.
So I find this behavior just strange but maybe I'm missing something here.
I tested more since yesterday and I realized that the reason I see different results in SAS 9.4 and EG is due to the fact that the session encoding in EG was set to wlatin1 instead of UTF-8.
When I set the session encoding to wlatin1 in SAS 9.4 or EG I get better data quality than with utf-8, even though the Postgres shows this:
Server Encoding: UTF-8, Client Encoding: UNICODE
When I use SAS in wlatin1 characters like bullet points • and É are displayed correctly.
When I use SAS in UFT-8 I get in both cases this: �
I don't understand that.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.