BookmarkSubscribeRSS Feed
djrisks
Barite | Level 11

Hello,

 

I'm using SAS Studio to read in a .DAT file. The .DAT file has special characters such as em dash and en dash, i.e. — and –. I would like to keep these special characters in my SAS dataset. However when I look at the imported dataset I have a black diamond question mark, such as in the image below.

 

djrisks_0-1586450751574.jpeg

Please can you help me. I've tried to use different methods to import the data such as Proc Import and Data infile. I've also tried to use the utf-8 unicode option.

 

I look forward to hearing from you.

 

Many thanks,

 

Kriss Harrris

4 REPLIES 4
Tom
Super User Tom
Super User
Show example data file and your example data step that is not working. Also make sure that your SAS session is running with system option ENCODING set to utf-8. If you are trying to use a single byte encoding, like WLATIN1, then you will have trouble reading any character outside of the 256 characters that a single byte can represent.
ErikLund_Jensen
Rhodochrosite | Level 12

Hi @djrisks 

 

The EM Dash is not an official part of the latin character set. It is often represented with the hex value 96 which is an "undocumented" ascii character and not a unicode character. It is rendered differently in different SAS products, so it is preferable to get rid of it. I have encountered it on a few occations, and I have solved the problem by translating it to a normal dash.

 

Given your data set from the post with a single-character column, try to make a second column, where you put the character with the format $hex2. , and if you get the value 96, then use this translate on your input data:  value = translate(value,'-','96'x);

 

 

djrisks
Barite | Level 11
Thank you for this! It helps me to identify the em dash. Although my value is 97. For some reason I can't seem to translate it back to the em-dash though, it's still using the hex format.
ErikLund_Jensen
Rhodochrosite | Level 12

Hi @djrisks 

 

Sorry, but I don't quite understand the problem. The translation to hex was just to identify the hex value, so why do you want to translate it back? It might help if you posted the code you are using.

 

As long as you work in windows/latin1, you will probably get the desired result if you translate hex 97 to hex 96 like this: translate('96'x,'97'x). But you are i dire straits with the em dash in linux/latin9, so I still think it would be better to get rid of it altogether and translate it to a normal dash. In my experience the long dash only occurs in strings copy-pasted into a text field from Word. 

 

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 4847 views
  • 1 like
  • 3 in conversation