I know this topic has been covered before, but here's my question
I am reading this file, the notepad++ indicates that it is a utf-8 bom file
While reading the file with encoding utf-8 i get all my danish characters æøå correctly but there are som characters that can not be transcoded.
It gives me the warning
reading csv gives me WARNING: A character that could not be transcoded was encountered.
I am looking for a way to locate what those characters are so i can modify the file.
How do i locate and write out characters that can not be transcoded.
filename sim 'L:\Work\SIJ\releaseinfo\releaseinfo\releaseinfo7.csv'; filename out 'L:\Work\SIJ\releaseinfo\releaseinfo\releaseinfo7-out.csv'; data test; infile sim lrecl= 2000 encoding="utf-8"; file out lrecl=2000 encoding='utf-8'; input; put _infile_; run;
running sas 9.2 from SAS-EG 4.3
THX
I assume your SAS session is not using UTF-8 encoding, meaning that some of the characters in the file cannot be converted to the Danish 8-bit character set you are using. One way to find this may be to read the file into two datasets, one of them having utf-8 encoding:
data standard utf8(encoding='utf-8');
infile sim dsd delimiter=';' encoding='utf-8';
/* here go your statements to read the variables from the .csv file */
run;
You should then have to datasets, one of them in your normal encoding, and the other in UTF-8. Running a PROC COMPARE may then show you which variables were not translatable.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.