BookmarkSubscribeRSS Feed
SASGeek
Quartz | Level 8

Hello,

I need your help understanding the why behind the following issue and suggestions on how to fix it.

 

I have an XLSX file I download from a vendor site (no back-end access allowed), save it to a .csv, move it to our SAS Server, and use SAS EG to import the .csv file as a dataset using a DATA STEP with Infile, Length, Label, Format, Informat, and Input all specified for full control.

 

Here is my INFILE statement:

 

INFILE &newfile.
        LRECL=32767
        TERMSTR=CRLF
        FIRSTOBS=2
        MISSOVER
        TRUNCOVER
        ENCODING="UTF-8"
        DLM='2C'X
        DSD ; 

 

Some of the larger text fields do not import all characters properly such as a colon (:) or an apostrophe (') and I get a weird character box () instead of the colon or apostrophe. Does not seem to be happening with letters or numbers.  The issue is not consistent - sometimes it imports fine, other times it does not.

 

I suspect it's the UTF-8 statement, but what else can I use?

 

Thank you

 

Paula

 

5 REPLIES 5
SASKiwi
PROC Star

Did you save it as a UTF-8 CSV to begin with?

Screenshot 2026-02-24 125113.jpg

 

Otherwise using the ENCODING option is unlikely to work.

 

 

Tom
Super User Tom
Super User

Why not just move the XLSX file to the SAS server and read it using the XLSX engine?

 

Either with PROC IMPORT using DBMS=XLSX or by using a LIBNAME statement with the XLSX engine.

Tom
Super User Tom
Super User

What is the setting for the system option ENCODING for the SAS session where you ran the data step?

%put %sysfunc(getoption(encoding));

If it is not UTF-8 then you are going to have trouble reading an UTF-8 file that uses anything that is not available in the encoding that you are using.

 

Note that the ENCODING option can only be set when the SAS session starts.  So if you are using Enterprise Guide to submit your SAS code you will need to make sure to connect to a SAS session that is setup using UTF-8.

 

There are many characters in UTF-8 that look like a single quote but are really a different character. 

https://www2.cs.sfu.ca/~ggbaker/reference/characters/#single

 

If you really cannot control what encoding you are running in then read in the file using ENCODING=ANY on the INFILE statement.  You will then need to make your own code to translate any bytes strings representing UTF-8 characters into the proper bytes for your encoding.

 

For example:

* Change left single quote to normal single quote ;
charvar = tranwrd(charvar,'E28098'x,"'");

 

 

Ksharp
Super User

Agree with Tom. It seems the problem is coming from ENCODING="UTF-8" , make sure the encoding of CSV is also "UTF-8" . You could check it with Notepad++ 

 

 

屏幕截图 2025-09-09 175017.png

Tom
Super User Tom
Super User

That would not help if the SAS session is not using UTF-8 encoding.

Catch up on SAS Innovate 2026

Nearly 200 sessions are now available on demand with the SAS Innovate Digital Pass.

Explore Now →
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 352 views
  • 0 likes
  • 4 in conversation