DATA Step, Macro, Functions and more

How to handle DBCS Character in datasets ?

Reply
Contributor
Posts: 72

How to handle DBCS Character in datasets ?

Dear forum experts

In our data we are getting unicode characters of some DBCS characters. for e.g the symbol for mu is converted as μ . We are having lot of such characters in our RDE data.

We are not sure how we should handle these texts. These are important characters and we do not know how to process them.

Please let me know.

Please check the screenshot also.

Thanks for your help.

Anand


screenshot.jpg
Super User
Posts: 19,772

Re: How to handle DBCS Character in datasets ?

Posted in reply to anandbillava

How do you want them handled? Do you want them stripped out? Read in and displayed as mu?

Contributor
Posts: 72

Re: How to handle DBCS Character in datasets ?

Thanks Reeza. I did stripped those characters. But we came to know that these are required and we have to convert them back to their actual values.

Contributor
Posts: 65

Re: How to handle DBCS Character in datasets ?

Posted in reply to anandbillava

Well, I'm a little confused by the representation of the unicode characters that you're seeing, but I'll offer my 2 cents. The format "&#n;" is, in the unicode world, called the "numeric character representation" or NCR, where "n" is a number, and the other characters are literal. In your screenshot, I'm afraid I don't know what the leading "/" or the trailing "l" are for. In any event, you should be able to  strip out those characters, and then convert what's left with the SAS unicode() function. Here's an example:

data one;

input wbc wbcoth_uni $;

wbcoth = unicode(wbcoth_uni,'ncr');

datalines4;

3690 μ

;;;;

run;

When I open the table "one" in ViewTable, I see a mu in the wbcoth column. Please note that you do need to be running the unicode version of SAS, which may not be the default at your institution. On my Windows system, it's in the start menu-->All Programs-->SAS-->Additional Languages-->SAS 9.3 (unicode support).

HTH

Karl

Super User
Posts: 10,023

Re: How to handle DBCS Character in datasets ?

Posted in reply to anandbillava

Check some options :

infile x  encoding=dbcs recft=    termstr=  

Ask a Question
Discussion stats
  • 4 replies
  • 253 views
  • 0 likes
  • 4 in conversation