BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mkhan2010
Calcite | Level 5
Hi,

I am having difficulty removing special characters from string in SAS8, how can I remove them?

For example,

If I have the string abëd34Ý90$#$%a
and would like to remove ë and Ý - how can I do this?

Please note that I would only like to keep the following characters (ignoring the case):

ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890`~!@#$%^&*()-_=+\|[]{};:',."<>?/
and the space characters such as blankspace, tab character and new line character.

I was able to accomplish this by defining an escape character and by using the compress function with the 'k' flag in SAS9 but we are still few months away from migrating to SAS9 in our production environment and thus I have to program in SAS8.
1 ACCEPTED SOLUTION

Accepted Solutions
kmw
SAS Employee kmw
SAS Employee

Since the original question was posted in 2011, it's assumed that by now SAS®9 has been installed, therefore this solution is written for customers using SAS®9 technology.

.

 

data a_;

  x='abëd34Ý90$#$%a';

  new=compress(x,"ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890`~!@#$%^&*()-_=+\|[]{};:',.<>?/ " , "kis");

run;

The COMPRESS function is typically used to remove unwanted characters from a variable, but in this example, the characters to keep are specified.

 

In the second argument of the COMPRESS function, specify characters that you want to keep in X, and specify in the third argument any modifiers. In this example, the modifiers used are:

 

      'k' keeps the characters in the list instead of removing them.

      'i'  ignores the case of the characters to be kept or removed.

      's' adds space characters (blank, horizontal tab, vertical tab, carriage return, line feed, and form feed) to the list of characters.

 

Although there are different ways to solve this problem, I chose this approach for simplicity.

 

 

View solution in original post

4 REPLIES 4
mkhan2010
Calcite | Level 5
I was able to solve this problem however my solution is not very elegant. If you have a better solution, please share.

First, I defined an escape character at the top of my code:

ODS escapechar='\';

I called the string variable holding the string LONG_DESCRIPTION.

here is the code:

SPECIALCHAR = VERIFY(UPCASE(LONG_DESCRIPTION),"ABCDEFGHIJKLMNOP
QRSTUVWXYZ1234567890`~!@#$%¬&*()-_+={}[];:<>./?",
",\\|'\n\t\_",'"');

DO WHILE (SPECIALCHAR NE 0);
THECHAR=SUBSTR(LONG_DESCRIPTION,SPECIALCHAR,1);
LONG_DESCRIPTION = COMPRESS(LONG_DESCRIPTION,THECHAR);

SPECIALCHAR = VERIFY(UPCASE(LONG_DESCRIPTION),"ABCDEFGHIJKLMNOP
QRSTUVWXYZ1234567890`~!@#$%¬&*()-_+={}[];:<>./?",
",\\|'\n\t\_",'"');

END; Message was edited by: mkhan2010
Robert_Bardos
Fluorite | Level 6
I used to do this by means of the translate function (if memory serves me well I should add since I don't have SAS at hand). Somewhat like
[pre]
target_string = translate(original_string,'','ëï') ;
[/pre]
Note however that variable target_string will inherit its length from variable original_string which means that the resulting string will have the removed characters "replaced" by "padding" blanks.
Peter_C
Rhodochrosite | Level 12
In SAS8 removing an uncertain list of symbols was simplified by defining what you need to keep. That is now a feature in SAS9, but for SAS8 the approach was like:
supposing you want only the numeric and alphabetic characters to be kept[pre]reduced = ( compress( original
, compress( lowcase(original)
, 'qwertyuiopasdfghjklzxcvbnm1234567890'
/* these are the alpha and numeric characters on my keyboard
once these are removed what is left are the ones I don't want
so these are the ones I want to compres out of the original string */
)
)

) ;[/pre]hope that helps
kmw
SAS Employee kmw
SAS Employee

Since the original question was posted in 2011, it's assumed that by now SAS®9 has been installed, therefore this solution is written for customers using SAS®9 technology.

.

 

data a_;

  x='abëd34Ý90$#$%a';

  new=compress(x,"ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890`~!@#$%^&*()-_=+\|[]{};:',.<>?/ " , "kis");

run;

The COMPRESS function is typically used to remove unwanted characters from a variable, but in this example, the characters to keep are specified.

 

In the second argument of the COMPRESS function, specify characters that you want to keep in X, and specify in the third argument any modifiers. In this example, the modifiers used are:

 

      'k' keeps the characters in the list instead of removing them.

      'i'  ignores the case of the characters to be kept or removed.

      's' adds space characters (blank, horizontal tab, vertical tab, carriage return, line feed, and form feed) to the list of characters.

 

Although there are different ways to solve this problem, I chose this approach for simplicity.

 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 197317 views
  • 15 likes
  • 4 in conversation