BookmarkSubscribeRSS Feed
mcdooyledoo
Fluorite | Level 6

Dear all, 

 

I'm trying to understand and solve a problem I have with SAS, desktop version, 9.4

I accessing a Postgres database (db encoding is UTF-8) with SAS with Proc SQL through the ODBC engine. I use SAS in Unicode and the datafiles that are being created are in UTF-8 unicode format.

 

One of the lines in the result set I get back show the white question mark in a black diamond shape instead of bullet points. 

 

This is the code

libname sql_read odbc dsn=datasource schema=schemaAB;

PROC SQL;
CREATE TABLE work.test (encoding='UTF-8') as
SELECT id, result
FROM sql_read.tableA 
where id = '123'
;
QUIT;

The log is clean, it doesn't see anything about truncation

 

For trouble shooting I checked other tools to see what the result is there:

Python, and MS Access using the same Odbc data source as SAS and show the bullet points.

So I think this is something SAS specific, but I don't know what.

 

I tried to use several encoding settings I found but nothing helps.

 

Does anyone have an idea what this might be?

The bullet point in this case might just be one example, I have other tables that I just didn't check yet.

 

Thanks!

  

12 REPLIES 12
mcdooyledoo
Fluorite | Level 6

I did try that, but I get the following error:

libname sql_read odbc dsn=datasource schema=schemaAB client_encoding='utf-8';

ERROR: Libref SQL_READ is not assigned.
ERROR: Error in the LIBNAME statement.
ERROR 22-7: Invalid option name CLIENT_ENCODING.

I assume this is because I use the ODBC engine, not the Postgres engine. My SAS installation does not include the Postgres engine.

 

SASKiwi
PROC Star

What is your SAS session encoding? That will likely influence viewing the contents of a SAS dataset regardless of what encoding it is stored with.

mcdooyledoo
Fluorite | Level 6
proc options option=encoding;
run;

shows:

SAS (r) Proprietary Software Release 9.4 TS1M5

ENCODING=UTF-8 Specifies the default character-set encoding for the SAS session.

mcdooyledoo
Fluorite | Level 6

I also tried to manually create a data file with the string the has the bullet points that I see in MS Access or Python when I extract the same data as with the SAS Proc SQL statement:

data test2;
infile datalines delimiter='|';
informat id $20. result $200. ;
input id result ;
datalines;
123 | • TEST TEST TEST
;
run;

This is what I see when I look at table test2:

2021-03-11_14-45-34.png

 

 

Compared to the file the Proc SQL statement is creating:

2021-03-11_14-52-05.png

 

 

 

 

So my SAS session can display the bullet point character, but when retrieving the data is doesn't get it, even though other programs that use the same ODBC data connection are displaying the bullet point.

Tom
Super User Tom
Super User

Are you just talking about how it looks in the SAS Display Manager interface?  I don't think that interface is very UTF-8 compatible.  What does is look like you you write it to an RTF or HTML file using ODS?

mcdooyledoo
Fluorite | Level 6

Thank you for this suggestion!

I did an ODS export to PDF of the data I get from the Postgres database and now I do see this:

2021-03-11_17-13-01.png

 

 

So as you thought, the Display Manager is not showing it correctly.

But, as a next step I did run a proc compare between the file I downloaded and the one I manually created by pasting the value in there that I got with MS Access, and I see this:

2021-03-11_17-22-06.png

 

 

 

 

 

 

 

 

 

I would have expected now that the underlying value is the same. Is there anything I can do here?

SASKiwi
PROC Star

Do you have Enterprise Guide or SAS Studio available? They are likely to be more UTF-8 aware than Display Manager.

mcdooyledoo
Fluorite | Level 6

I just ran the same code in Enterprise Guide and did a proc compare.

mcdooyledoo_0-1615516354811.png

 

 

So now I'm just surprised to see that EG and normal SAS handle UTF-8 data that differently in some cases. 

 

Thank you for the suggestion.

 

Tom
Super User Tom
Super User

Looks to me like you pasted UTF-8 text into an editor that doesn't support UTF-8 code.  I am not surprised that the result is different.

mcdooyledoo
Fluorite | Level 6

Hi Tom,

 

I copied text the UFT-8 character (the bullet point) into the SAS program code in SAS 9.4 to create a sample dataset that I compared with proc compare to the information that I download form the Postgres database. And I execute the same program one time in SAS 9.4 and after that in EG.

I use proc compare for both data sets: 

  • SAS 9.4 shows a difference between the downloaded information and the sample data set.
  • EG doesn't show that difference, the information is the same.

SAS 9.4 displays the bullet point when I output the information with ODS to PDF but not in the Data Viewer or in Proc Compare.

EG shows the bullet point everywhere.

 

So I find this behavior just strange but maybe I'm missing something here. 

 

mcdooyledoo
Fluorite | Level 6

I tested more since yesterday and I realized that the reason I see different results in SAS 9.4 and EG is due to the fact that the session encoding in EG was set to wlatin1 instead of UTF-8.

 

When I set the session encoding to wlatin1 in SAS 9.4 or EG I get better data quality than with utf-8, even though the Postgres shows this:

Server Encoding: UTF-8, Client Encoding: UNICODE 

 

When I use SAS in wlatin1 characters like bullet points • and É are displayed correctly.

When I use SAS in UFT-8 I get in both cases this: �

 

I don't understand that. 

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 3529 views
  • 4 likes
  • 4 in conversation