SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
rdum96
Calcite | Level 5

I'm importing a csv using the following code:

 

data finaldata;
  infile inputfile encoding='utf-8' truncover;
  input  mainline $varylen30000. ;
run;

However some of the characters like em dash, en dash etc are being converted to unusual characters such as below 

 

image.png

I've had success gaining the hyphen of en dash using the code below

mainline=mainline(tranwrd,'96'x,'-')

but this is very specific to one of the cases I found. Is there someone way I could tackle all these special characters?

6 REPLIES 6
Tom
Super User Tom
Super User

If your SAS session is not also using UTF-8 encoding then it might not be possible to transcode every character in your UTF-8 text file into single byte encodings.

 

Also with single byte encodings what glyph is displayed for any particular byte will depend of the FONT you are using. 

rdum96
Calcite | Level 5
Ah gotcha! I believe I can't modify the SAS session 😞 I looked into the encoding and looks like it's 'latin1'. I was hoping the encoding option in the infile statement would help! I'll try to hard code everything for the time being!
Tom
Super User Tom
Super User

Read the file using ENCODING=ANY and then manually translate the non-ASCII codes into either single byte characters or some other things.

 

For example you could code like this to transcode the plus/minus symbol into the three character string +/- instead.

data want;
  infile 'textfile' encoding='any';
  input line $char100.;
  line = tranwrd(line,'C2B1'x,'+/-');
run;
Patrick
Opal | Level 21

To test if this is a read/transcode or a write/display issue you could run below and see what gets printed.

data finaldata;
  em_dash='E28094'x;
  infile inp encoding='utf-8' truncover;
  input  mainline $10. ;
run;
proc print data=finaldata(obs=1);
run;

In my environment things are working:

Patrick_0-1704939505122.png

 

 

rdum96
Calcite | Level 5

Thank you! I just checked and I get other special characters upon performing this 😞 

Image 1-10-24 at 10.04 PM.jpeg

Patrick
Opal | Level 21

For me in a single byte session with WLATIN1 the Hex value is 97.

filename inp "c:\temp\test_emdash_u8.csv";
data finaldata;
	em_dash='97'x;
	encoding="%sysfunc(getoption(encoding,keyexpand))";
	infile inp encoding='utf-8' truncover;
	input  mainline $10. ;
	hex=put(mainline,$hex20.);
run;
proc print data=finaldata(obs=1);
run;

Patrick_0-1704953248404.png

 

 

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2373 views
  • 0 likes
  • 3 in conversation