Hello,
I need your help understanding the why behind the following issue and suggestions on how to fix it.
I have an XLSX file I download from a vendor site (no back-end access allowed), save it to a .csv, move it to our SAS Server, and use SAS EG to import the .csv file as a dataset using a DATA STEP with Infile, Length, Label, Format, Informat, and Input all specified for full control.
Here is my INFILE statement:
Some of the larger text fields do not import all characters properly such as a colon (:) or an apostrophe (') and I get a weird character box () instead of the colon or apostrophe. Does not seem to be happening with letters or numbers. The issue is not consistent - sometimes it imports fine, other times it does not.
I suspect it's the UTF-8 statement, but what else can I use?
Thank you
Paula
Did you save it as a UTF-8 CSV to begin with?
Otherwise using the ENCODING option is unlikely to work.
Why not just move the XLSX file to the SAS server and read it using the XLSX engine?
Either with PROC IMPORT using DBMS=XLSX or by using a LIBNAME statement with the XLSX engine.
What is the setting for the system option ENCODING for the SAS session where you ran the data step?
%put %sysfunc(getoption(encoding));
If it is not UTF-8 then you are going to have trouble reading an UTF-8 file that uses anything that is not available in the encoding that you are using.
Note that the ENCODING option can only be set when the SAS session starts. So if you are using Enterprise Guide to submit your SAS code you will need to make sure to connect to a SAS session that is setup using UTF-8.
There are many characters in UTF-8 that look like a single quote but are really a different character.
https://www2.cs.sfu.ca/~ggbaker/reference/characters/#single
If you really cannot control what encoding you are running in then read in the file using ENCODING=ANY on the INFILE statement. You will then need to make your own code to translate any bytes strings representing UTF-8 characters into the proper bytes for your encoding.
For example:
* Change left single quote to normal single quote ;
charvar = tranwrd(charvar,'E28098'x,"'");
Agree with Tom. It seems the problem is coming from ENCODING="UTF-8" , make sure the encoding of CSV is also "UTF-8" . You could check it with Notepad++
That would not help if the SAS session is not using UTF-8 encoding.
Nearly 200 sessions are now available on demand with the SAS Innovate Digital Pass.
Explore Now →Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.