BookmarkSubscribeRSS Feed
rdum96
Calcite | Level 5

I'm importing a csv using the following code:

 

data finaldata;
  infile inputfile encoding='utf-8' truncover;
  input  mainline $varylen30000. ;
run;

However some of the characters like em dash, en dash etc are being converted to unusual characters such as below 

 

image.png

I've had success gaining the hyphen of en dash using the code below

mainline=mainline(tranwrd,'96'x,'-')

but this is very specific to one of the cases I found. Is there someone way I could tackle all these special characters?

6 REPLIES 6
Tom
Super User Tom
Super User

If your SAS session is not also using UTF-8 encoding then it might not be possible to transcode every character in your UTF-8 text file into single byte encodings.

 

Also with single byte encodings what glyph is displayed for any particular byte will depend of the FONT you are using. 

rdum96
Calcite | Level 5
Ah gotcha! I believe I can't modify the SAS session 😞 I looked into the encoding and looks like it's 'latin1'. I was hoping the encoding option in the infile statement would help! I'll try to hard code everything for the time being!
Tom
Super User Tom
Super User

Read the file using ENCODING=ANY and then manually translate the non-ASCII codes into either single byte characters or some other things.

 

For example you could code like this to transcode the plus/minus symbol into the three character string +/- instead.

data want;
  infile 'textfile' encoding='any';
  input line $char100.;
  line = tranwrd(line,'C2B1'x,'+/-');
run;
Patrick
Opal | Level 21

To test if this is a read/transcode or a write/display issue you could run below and see what gets printed.

data finaldata;
  em_dash='E28094'x;
  infile inp encoding='utf-8' truncover;
  input  mainline $10. ;
run;
proc print data=finaldata(obs=1);
run;

In my environment things are working:

Patrick_0-1704939505122.png

 

 

rdum96
Calcite | Level 5

Thank you! I just checked and I get other special characters upon performing this 😞 

Image 1-10-24 at 10.04 PM.jpeg

Patrick
Opal | Level 21

For me in a single byte session with WLATIN1 the Hex value is 97.

filename inp "c:\temp\test_emdash_u8.csv";
data finaldata;
	em_dash='97'x;
	encoding="%sysfunc(getoption(encoding,keyexpand))";
	infile inp encoding='utf-8' truncover;
	input  mainline $10. ;
	hex=put(mainline,$hex20.);
run;
proc print data=finaldata(obs=1);
run;

Patrick_0-1704953248404.png

 

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2724 views
  • 0 likes
  • 3 in conversation