Dear community,
I am here with another unknown problem, I am facing this first time.
Hope your wise advice could help me.
Here is the part of xlsx file I am importing
Here is the import syntax I am using
proc import out =aexls datafile='xxxxxxxxxxxxxxxxxxxxxxxx/AESI.xlsx'
dbms=xlsx replace;
getnames = no;
run;
Here is the aexls dataset.
I have no idea how these characters are appearing.
I don't want to remove them with hardcoding.
Could you please suggest any turnaround here?
And here is the log.
Which is fine I think.
Thank you all in advance.
Kind Regards
Artur
What encoding is your SAS session using? Check the automatic macro variable SYSENCODING or the system option ENCODING.
Note that you cannot change the encoding once SAS is running. You need to specify that setting when SAS starts. Typically by using a different command to start SAS. (Or if you are running SAS via some user interface like Enterprise Guide or SAS/Studio then connecting to a different "application server" that is using the encoding you want.)
Thanks for your reply Tom.
Value of automatic macro variable &SYSENCODING. is latin1
A single byte encoding, like LATIN1, can only represent 256 distinct characters.
Try importing the file using a SAS session that is using UTF-8 encoding.
If you are using a Windows icon to run SAS then you should have a version that launches SAS with "UNICODE" support. That will be using UTF-8 encoding. If you are running SAS from the command line check if your local installation team has created a different command to use that starts SAS with unicode support. For example instead of sas as the command you might need to use sas_u8. If you are using Enterprise Guide or SAS/Studio then connect to a different application server that has SAS configured to use UTF-8.
Do those other values in the XLSX with the "funny characters" appear not to be on the first (top) line of the cell when examined using the formula window? Your 187 appears to be pushed down a line. Which makes me suspect that odd stuff is the result of something equivalent to the "alt-Enter" key behavior of inserting line breaks in the cell or similar.
Do a File-Save As to a CSV format and then examine the file with a TEXT editor like Notepad to see if the random characters appear there as well.
Welcome to discovering yet another reason why XLSX files are often poor choice for data interchange between systems.
Thank you all for your suggestions.
The solution(temporary) was to convert .xlsx to .xls format.
After that using the same import procedure the excess characters were gone.
Kind Regards
Artur
> I have no idea how these characters are appearing.
They are here because they are in Excel before the 187.
Either clean them in Excel or clean them in SAS after reading them (only keep digits for instance for the cell shown). For example remove the string �x00a0; Or remove anything starting with & and ending with ;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.