Here's a followup to my post with an explanation why some datasets with Wlatin1 encoding can produce an error message when being read from a SAS configuration using the UTF-8 encoding.
First, some background. ASCII is a well-known encoding standard but strictly speaking, ASCII is restricted to the first 128 characters."Wlatin1", referred to on Wikipedia as "Windows-1252" was devised by Microsoft and extends the ASCII encoding to 256 characters. Windows-1252 is itself derived from the ISO/IEC 8859-1 except the range 128 to 159 (hex 80 to 9F). And that's the key to the problem I encountered.
You see, UTF-8 is backwards compatible with ISO 8859-1. Which means that UTF-8 is mostly backward compatible with Wlatin1. If you look at the chart of character codes halfway down the page at https://en.wikipedia.org/wiki/Windows-1252, you can see the diffeerences highlighted with a green outline. Characters such as the euro symbol €, dagger †, trademark "™", are different in Windows-1252 and ISO 8859-1 and are therefore not compatible with UTF-8.
To test this, I created a simple permanent dataset with a single character.
data adam.char;
thischar="€";
run;
I didn't have any troubles using 'thischar="µ";'. The UTF-8 configuration reported that transcoding had taken place but there was no error message. But using the euro sign, I got the error message reported in my original post. This isn't a valid ISO 8859-1 character and it therefore can't be read by the UTF-8 configuration without further encoding instructions.
I hope this is clear enough. The bottom line is that the Wlatin1 encoding can be read using a UTF-8 configuration, except for characters in the range 128-159
... View more