I'm importing a csv using the following code:
data finaldata;
infile inputfile encoding='utf-8' truncover;
input mainline $varylen30000. ;
run;
However some of the characters like em dash, en dash etc are being converted to unusual characters such as below
I've had success gaining the hyphen of en dash using the code below
mainline=mainline(tranwrd,'96'x,'-')
but this is very specific to one of the cases I found. Is there someone way I could tackle all these special characters?
If your SAS session is not also using UTF-8 encoding then it might not be possible to transcode every character in your UTF-8 text file into single byte encodings.
Also with single byte encodings what glyph is displayed for any particular byte will depend of the FONT you are using.
Read the file using ENCODING=ANY and then manually translate the non-ASCII codes into either single byte characters or some other things.
For example you could code like this to transcode the plus/minus symbol into the three character string +/- instead.
data want;
infile 'textfile' encoding='any';
input line $char100.;
line = tranwrd(line,'C2B1'x,'+/-');
run;
To test if this is a read/transcode or a write/display issue you could run below and see what gets printed.
data finaldata;
em_dash='E28094'x;
infile inp encoding='utf-8' truncover;
input mainline $10. ;
run;
proc print data=finaldata(obs=1);
run;
In my environment things are working:
Thank you! I just checked and I get other special characters upon performing this 😞
For me in a single byte session with WLATIN1 the Hex value is 97.
filename inp "c:\temp\test_emdash_u8.csv";
data finaldata;
em_dash='97'x;
encoding="%sysfunc(getoption(encoding,keyexpand))";
infile inp encoding='utf-8' truncover;
input mainline $10. ;
hex=put(mainline,$hex20.);
run;
proc print data=finaldata(obs=1);
run;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.