- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am having difficulty removing special characters from string in SAS8, how can I remove them?
For example,
If I have the string abëd34Ý90$#$%a
and would like to remove ë and Ý - how can I do this?
Please note that I would only like to keep the following characters (ignoring the case):
ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890`~!@#$%^&*()-_=+\|[]{};:',."<>?/
and the space characters such as blankspace, tab character and new line character.
I was able to accomplish this by defining an escape character and by using the compress function with the 'k' flag in SAS9 but we are still few months away from migrating to SAS9 in our production environment and thus I have to program in SAS8.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Since the original question was posted in 2011, it's assumed that by now SAS®9 has been installed, therefore this solution is written for customers using SAS®9 technology.
.
data a_;
x='abëd34Ý90$#$%a';
new=compress(x,"ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890`~!@#$%^&*()-_=+\|[]{};:',.<>?/ " , "kis");
run;
The COMPRESS function is typically used to remove unwanted characters from a variable, but in this example, the characters to keep are specified.
In the second argument of the COMPRESS function, specify characters that you want to keep in X, and specify in the third argument any modifiers. In this example, the modifiers used are:
'k' keeps the characters in the list instead of removing them.
'i' ignores the case of the characters to be kept or removed.
's' adds space characters (blank, horizontal tab, vertical tab, carriage return, line feed, and form feed) to the list of characters.
Although there are different ways to solve this problem, I chose this approach for simplicity.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
First, I defined an escape character at the top of my code:
ODS escapechar='\';
I called the string variable holding the string LONG_DESCRIPTION.
here is the code:
SPECIALCHAR = VERIFY(UPCASE(LONG_DESCRIPTION),"ABCDEFGHIJKLMNOP
QRSTUVWXYZ1234567890`~!@#$%¬&*()-_+={}[];:<>./?",
",\\|'\n\t\_",'"');
DO WHILE (SPECIALCHAR NE 0);
THECHAR=SUBSTR(LONG_DESCRIPTION,SPECIALCHAR,1);
LONG_DESCRIPTION = COMPRESS(LONG_DESCRIPTION,THECHAR);
SPECIALCHAR = VERIFY(UPCASE(LONG_DESCRIPTION),"ABCDEFGHIJKLMNOP
QRSTUVWXYZ1234567890`~!@#$%¬&*()-_+={}[];:<>./?",
",\\|'\n\t\_",'"');
END; Message was edited by: mkhan2010
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
[pre]
target_string = translate(original_string,'','ëï') ;
[/pre]
Note however that variable target_string will inherit its length from variable original_string which means that the resulting string will have the removed characters "replaced" by "padding" blanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
supposing you want only the numeric and alphabetic characters to be kept[pre]reduced = ( compress( original
, compress( lowcase(original)
, 'qwertyuiopasdfghjklzxcvbnm1234567890'
/* these are the alpha and numeric characters on my keyboard
once these are removed what is left are the ones I don't want
so these are the ones I want to compres out of the original string */
)
)
) ;[/pre]hope that helps
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Since the original question was posted in 2011, it's assumed that by now SAS®9 has been installed, therefore this solution is written for customers using SAS®9 technology.
.
data a_;
x='abëd34Ý90$#$%a';
new=compress(x,"ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890`~!@#$%^&*()-_=+\|[]{};:',.<>?/ " , "kis");
run;
The COMPRESS function is typically used to remove unwanted characters from a variable, but in this example, the characters to keep are specified.
In the second argument of the COMPRESS function, specify characters that you want to keep in X, and specify in the third argument any modifiers. In this example, the modifiers used are:
'k' keeps the characters in the list instead of removing them.
'i' ignores the case of the characters to be kept or removed.
's' adds space characters (blank, horizontal tab, vertical tab, carriage return, line feed, and form feed) to the list of characters.
Although there are different ways to solve this problem, I chose this approach for simplicity.