I'm on SAS Enterprise Guide 7.11 HF5 (7.100.1.2856) (64-bit), SAS 9.4.
When importing a csv file, I get the warning "WARNING: A character that could not be transcoded has been replaced in record 1726." in some rows (the number at the end changes according to the row it is reading).
I was able to determine that the character causing the problem is the Unicode Character U+0099.
According to Notepad++, the fiile is enconded in "UCS-2 LE BOM" and I'mporting the file in SAS using the encoding "UTF-16".
Can't SAS read this character? Am I not usin the correct encoding option in SAS?
Am I doing something wrong? If I'm not, how can I get SAS to ignore this warning?
Thanks @Autotelic.
OK, so here is what's happening... in order for us to be able to import the file into SAS, it has to be in an encoding that the SAS System will understand. If the SAS System encounters any characters that are not within its current encoding then it will throw a Transcoding Error and your job will stop. It kind of treats those as serious errors.
In order to prevent that from happening, the Import Data task reads the file in whatever encoding it's been told the file is in (either through information in the file or by you, the user, specifying the encoding to use) and checks to see if each character in the file maps directly to a matching character in the server's encoding. If there is no matching character, then it replaces it with a space character and puts that message you are seeing in the log.
I've looked up the UTF-16 character U+0099 in a document on the unicode.org website and have found that it is simply described as a control code - it is not even named, as most of the recognized Unicode characters are. Normally, we'd be able to look up a character in UTF-16 LE and find out its purpose and then match it with a similarly named character in the server's encoding (WLATIN1 in your case) but we can't do that. So I'm pretty certain that there will be no one-to-one match for that 'control code' character in WLATIN1 (or probably any other encoding for that matter).
So I'm pretty sure that the Import Data task was doing the right thing in removing what would otherwise have been a transcoding error from the file.
I hope this explanation helps.
David.
Connect to a SAS server process that is using UTF-8 as the encoding.
Do you have a SAS administrator to help you with this?
Maybe @ChrisHemedinger can tell us how to change the ENCODING setting that the LOCAL server uses?
Hi @Autotelic,
Could you tell me what encoding of your local server is running with?
That can be found by going to the Servers list in Enterprise Guide, making sure that you are connected to your local server and then right-click on the local server node and displaying the Properties dialog for the server. On the Software tab in the Properties dialog, you will see "SAS Session Encoding". That value is the server's encoding.
Thanks, David.
Hi, David.
It's wlatin1.
Thanks @Autotelic.
OK, so here is what's happening... in order for us to be able to import the file into SAS, it has to be in an encoding that the SAS System will understand. If the SAS System encounters any characters that are not within its current encoding then it will throw a Transcoding Error and your job will stop. It kind of treats those as serious errors.
In order to prevent that from happening, the Import Data task reads the file in whatever encoding it's been told the file is in (either through information in the file or by you, the user, specifying the encoding to use) and checks to see if each character in the file maps directly to a matching character in the server's encoding. If there is no matching character, then it replaces it with a space character and puts that message you are seeing in the log.
I've looked up the UTF-16 character U+0099 in a document on the unicode.org website and have found that it is simply described as a control code - it is not even named, as most of the recognized Unicode characters are. Normally, we'd be able to look up a character in UTF-16 LE and find out its purpose and then match it with a similarly named character in the server's encoding (WLATIN1 in your case) but we can't do that. So I'm pretty certain that there will be no one-to-one match for that 'control code' character in WLATIN1 (or probably any other encoding for that matter).
So I'm pretty sure that the Import Data task was doing the right thing in removing what would otherwise have been a transcoding error from the file.
I hope this explanation helps.
David.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
