- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm importing a csv using the following code:
data finaldata;
infile inputfile encoding='utf-8' truncover;
input mainline $varylen30000. ;
run;
However some of the characters like em dash, en dash etc are being converted to unusual characters such as below
I've had success gaining the hyphen of en dash using the code below
mainline=mainline(tranwrd,'96'x,'-')
but this is very specific to one of the cases I found. Is there someone way I could tackle all these special characters?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If your SAS session is not also using UTF-8 encoding then it might not be possible to transcode every character in your UTF-8 text file into single byte encodings.
Also with single byte encodings what glyph is displayed for any particular byte will depend of the FONT you are using.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Read the file using ENCODING=ANY and then manually translate the non-ASCII codes into either single byte characters or some other things.
For example you could code like this to transcode the plus/minus symbol into the three character string +/- instead.
data want;
infile 'textfile' encoding='any';
input line $char100.;
line = tranwrd(line,'C2B1'x,'+/-');
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
To test if this is a read/transcode or a write/display issue you could run below and see what gets printed.
data finaldata;
em_dash='E28094'x;
infile inp encoding='utf-8' truncover;
input mainline $10. ;
run;
proc print data=finaldata(obs=1);
run;
In my environment things are working:
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you! I just checked and I get other special characters upon performing this 😞
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
For me in a single byte session with WLATIN1 the Hex value is 97.
filename inp "c:\temp\test_emdash_u8.csv";
data finaldata;
em_dash='97'x;
encoding="%sysfunc(getoption(encoding,keyexpand))";
infile inp encoding='utf-8' truncover;
input mainline $10. ;
hex=put(mainline,$hex20.);
run;
proc print data=finaldata(obs=1);
run;