Desktop productivity for business analysts and programmers

Importing text file with Unicode U+0099

Accepted Solution Solved
Reply
Contributor
Posts: 40
Accepted Solution

Importing text file with Unicode U+0099

[ Edited ]

I'm on SAS Enterprise Guide 7.11 HF5 (7.100.1.2856) (64-bit), SAS 9.4.

 

When importing a csv file, I get the warning "WARNING: A character that could not be transcoded has been replaced in record 1726." in some rows (the number at the end changes according to the row it is reading).

 

I was able to determine that the character causing the problem is the Unicode Character U+0099.

According to Notepad++, the fiile is enconded in "UCS-2 LE BOM" and I'mporting the file in SAS using the encoding "UTF-16".

 

Can't SAS read this character? Am I not usin the correct encoding option in SAS?

Am I doing something wrong? If I'm not, how can I get SAS to ignore this warning?


Accepted Solutions
Solution
‎11-03-2017 10:44 AM
SAS Employee
Posts: 5

Re: Importing text file with Unicode U+0099

Posted in reply to Autotelic

Thanks @Autotelic.

 

OK, so here is what's happening... in order for us to be able to import the file into SAS, it has to be in an encoding that the SAS System will understand. If the SAS System encounters any characters that are not within its current encoding then it will throw a Transcoding Error and your job will stop. It kind of treats those as serious errors.

 

In order to prevent that from happening, the Import Data task reads the file in whatever encoding it's been told the file is in (either through information in the file or by you, the user, specifying the encoding to use) and checks to see if each character in the file maps directly to a matching character in the server's encoding. If there is no matching character, then it replaces it with a space character and puts that message you are seeing in the log.

 

I've looked up the UTF-16 character U+0099 in a document on the unicode.org website and have found that it is simply described as a control code - it is not even named, as most of the recognized Unicode characters are. Normally, we'd be able to look up a character in UTF-16 LE and find out its purpose and then match it with a similarly named character in the server's encoding (WLATIN1 in your case) but we can't do that. So I'm pretty certain that there will be no one-to-one match for that 'control code' character in WLATIN1 (or probably any other encoding for that matter).

 

So I'm pretty sure that the Import Data task was doing the right thing in removing what would otherwise have been a transcoding error from the file.

 

I hope this explanation helps.

 

David.

View solution in original post


All Replies
Super User
Super User
Posts: 8,272

Re: Importing text file with Unicode U+0099

Posted in reply to Autotelic

Connect to a SAS server process that is using UTF-8 as the encoding.

Do you have a SAS administrator to help you with this?

Contributor
Posts: 40

Re: Importing text file with Unicode U+0099

I'm on a local server.
Alas, I do not have access to an administrator.
Super User
Super User
Posts: 8,272

Re: Importing text file with Unicode U+0099

Posted in reply to Autotelic

Maybe @ChrisHemedinger can tell us how to change the ENCODING setting that the LOCAL server uses?

SAS Employee
Posts: 5

Re: Importing text file with Unicode U+0099

[ Edited ]
Posted in reply to Autotelic

Hi @Autotelic,

Could you tell me what encoding of your local server is running with?
That can be found by going to the Servers list in Enterprise Guide, making sure that you are connected to your local server and then right-click on the local server node and displaying the Properties dialog for the server. On the Software tab in the Properties dialog, you will see "SAS Session Encoding". That value is the server's encoding.

Thanks, David.

Contributor
Posts: 40

Re: Importing text file with Unicode U+0099

Posted in reply to David_McNamara

Hi, David.

It's wlatin1.

image.png

Solution
‎11-03-2017 10:44 AM
SAS Employee
Posts: 5

Re: Importing text file with Unicode U+0099

Posted in reply to Autotelic

Thanks @Autotelic.

 

OK, so here is what's happening... in order for us to be able to import the file into SAS, it has to be in an encoding that the SAS System will understand. If the SAS System encounters any characters that are not within its current encoding then it will throw a Transcoding Error and your job will stop. It kind of treats those as serious errors.

 

In order to prevent that from happening, the Import Data task reads the file in whatever encoding it's been told the file is in (either through information in the file or by you, the user, specifying the encoding to use) and checks to see if each character in the file maps directly to a matching character in the server's encoding. If there is no matching character, then it replaces it with a space character and puts that message you are seeing in the log.

 

I've looked up the UTF-16 character U+0099 in a document on the unicode.org website and have found that it is simply described as a control code - it is not even named, as most of the recognized Unicode characters are. Normally, we'd be able to look up a character in UTF-16 LE and find out its purpose and then match it with a similarly named character in the server's encoding (WLATIN1 in your case) but we can't do that. So I'm pretty certain that there will be no one-to-one match for that 'control code' character in WLATIN1 (or probably any other encoding for that matter).

 

So I'm pretty sure that the Import Data task was doing the right thing in removing what would otherwise have been a transcoding error from the file.

 

I hope this explanation helps.

 

David.

Contributor
Posts: 40

Re: Importing text file with Unicode U+0099

Posted in reply to David_McNamara
Thanks, David. I understood everything.
Is there a way to make it so that this specific error, specifically for this character, doesn't yield a warning?
SAS Employee
Posts: 5

Re: Importing text file with Unicode U+0099

Posted in reply to Autotelic
Unfortunately, no there isn't a way to suppress the message. Because we are changing your data, we want to make sure that you are aware that it has been done and so we always produce that message.
The only thing I could suggest would be using Notepad++ to search for the character in your data file and change it to a space before trying to import it in with EG.
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 730 views
  • 6 likes
  • 3 in conversation