05-15-2013 11:10 AM
Dear forum experts
In our data we are getting unicode characters of some DBCS characters. for e.g the symbol for mu is converted as μ . We are having lot of such characters in our RDE data.
We are not sure how we should handle these texts. These are important characters and we do not know how to process them.
Please let me know.
Please check the screenshot also.
Thanks for your help.
05-15-2013 01:22 PM
Thanks Reeza. I did stripped those characters. But we came to know that these are required and we have to convert them back to their actual values.
05-15-2013 04:47 PM
Well, I'm a little confused by the representation of the unicode characters that you're seeing, but I'll offer my 2 cents. The format "&#n;" is, in the unicode world, called the "numeric character representation" or NCR, where "n" is a number, and the other characters are literal. In your screenshot, I'm afraid I don't know what the leading "/" or the trailing "l" are for. In any event, you should be able to strip out those characters, and then convert what's left with the SAS unicode() function. Here's an example:
input wbc wbcoth_uni $;
wbcoth = unicode(wbcoth_uni,'ncr');
When I open the table "one" in ViewTable, I see a mu in the wbcoth column. Please note that you do need to be running the unicode version of SAS, which may not be the default at your institution. On my Windows system, it's in the start menu-->All Programs-->SAS-->Additional Languages-->SAS 9.3 (unicode support).