- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I'm using SAS Studio to read in a .DAT file. The .DAT file has special characters such as em dash and en dash, i.e. — and –. I would like to keep these special characters in my SAS dataset. However when I look at the imported dataset I have a black diamond question mark, such as in the image below.
Please can you help me. I've tried to use different methods to import the data such as Proc Import and Data infile. I've also tried to use the utf-8 unicode option.
I look forward to hearing from you.
Many thanks,
Kriss Harrris
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @djrisks
The EM Dash is not an official part of the latin character set. It is often represented with the hex value 96 which is an "undocumented" ascii character and not a unicode character. It is rendered differently in different SAS products, so it is preferable to get rid of it. I have encountered it on a few occations, and I have solved the problem by translating it to a normal dash.
Given your data set from the post with a single-character column, try to make a second column, where you put the character with the format $hex2. , and if you get the value 96, then use this translate on your input data: value = translate(value,'-','96'x);
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @djrisks
Sorry, but I don't quite understand the problem. The translation to hex was just to identify the hex value, so why do you want to translate it back? It might help if you posted the code you are using.
As long as you work in windows/latin1, you will probably get the desired result if you translate hex 97 to hex 96 like this: translate('96'x,'97'x). But you are i dire straits with the em dash in linux/latin9, so I still think it would be better to get rid of it altogether and translate it to a normal dash. In my experience the long dash only occurs in strings copy-pasted into a text field from Word.