- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Posted 10-24-2022 12:30 AM
(1526 views)
My string variable contains apostrophes and foreign characters in a few observations. For example,
I want to remove the apostrophe (') in the first observation, and delete the second and third observations because they contain characters that are neither alphabet nor number. Is there a good way to do this in SAS?data have; input var $30.; datalines; Let's go for dinner pi�ata dance collective Another 인천 The 4th line ; run;
4 REPLIES 4
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can use the COMPRESS function to remove certain characters.
Then there's a lot of function to identify different type of characters in a string:
I would try a combination of the functions starting with NOT...
Data never sleeps
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The weird characters you're getting are likely due to using a single byte encoded editor or environment for multibyte characters (i.e. UTF-8).
Below should satisfy what you're asking for.
data want;
set have;
var=compress(var,"'");
if findc(var,' ','kfnp') then delete;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks! Do you have any suggestions for improving the editor's encoding issues? I used SAS 9.4 (English). If I add encoding='utf-8' into the import code, I got
ERROR: Invalid string.
FATAL: Unrecoverable I/O error detected in the execution of the DATA step program.
Aborted during the EXECUTION phase.
ERROR: Invalid string.
FATAL: Unrecoverable I/O error detected in the execution of the DATA step program.
Aborted during the EXECUTION phase.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data want;
set have;
var=compress(var,"'");
if prxmatch('/[^a-z\d\s]/i',var) then delete;
run;