I know this topic has been covered before, but here's my question
I am reading this file, the notepad++ indicates that it is a utf-8 bom file
While reading the file with encoding utf-8 i get all my danish characters æøå correctly but there are som characters that can not be transcoded.
It gives me the warning
reading csv gives me WARNING: A character that could not be transcoded was encountered.
I am looking for a way to locate what those characters are so i can modify the file.
How do i locate and write out characters that can not be transcoded.
filename sim 'L:\Work\SIJ\releaseinfo\releaseinfo\releaseinfo7.csv'; filename out 'L:\Work\SIJ\releaseinfo\releaseinfo\releaseinfo7-out.csv'; data test; infile sim lrecl= 2000 encoding="utf-8"; file out lrecl=2000 encoding='utf-8'; input; put _infile_; run;
running sas 9.2 from SAS-EG 4.3
THX
I assume your SAS session is not using UTF-8 encoding, meaning that some of the characters in the file cannot be converted to the Danish 8-bit character set you are using. One way to find this may be to read the file into two datasets, one of them having utf-8 encoding:
data standard utf8(encoding='utf-8');
infile sim dsd delimiter=';' encoding='utf-8';
/* here go your statements to read the variables from the .csv file */
run;
You should then have to datasets, one of them in your normal encoding, and the other in UTF-8. Running a PROC COMPARE may then show you which variables were not translatable.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.