BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
BRKS
Quartz | Level 8

I've seen a lot of posts about how to use COMPRESS to remove one or more predefined 'special characters' in their data.

Usually these are normal characters such as an asterisk or a slash mark.

 

I'm using search strings from a public website, and it has a huge variety of special characters (as in non-ascii HEX characters, like arrows) in the strings where I have hundreds of thousands of unique 'values'.

 

Is there a way to use the COMPRESS or any other function to 'search and destroy' any special characters without me having to specify them all?

 

I would like any normal ascii characters in a given string to remain, but to have all special characters stripped out (without me having to define the characters).

 

Any thoughts?  Thanks for your help!

 

Barb 

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

HI @BRKS  Have you looked at modifiers within compress function such as 

K- keep

a- chars

 

compress(string,,'ka');   like that?

View solution in original post

7 REPLIES 7
novinosrin
Tourmaline | Level 20

HI @BRKS  Have you looked at modifiers within compress function such as 

K- keep

a- chars

 

compress(string,,'ka');   like that?

BRKS
Quartz | Level 8

Thank you so much!  This ended up being a simple elegant solution!  🙂

 

For anyone in the future who has the same problem, and might be looking at this post, here is what I finally ended up with:

 

search_term_lower = compress(search_term_temp, "abcdefghijklmnopqrstuvwxyz0123456789 ,/()'-#!&+*:;<>", 'k');

 

This is the essential part of the COMPRESS function to do the job:

 

search_term_lower = COMPRESS(search_term_temp, " ", 'k');

 

* Put all of the stuff you want to keep between the double quotes above. 
  Note:  If you want to keep spaces, make sure there is a space character between the double quotes too!

 

* Normally COMPRESS deletes all characters from a string. 
  It's the last part with the       , 'k'       that tells SAS to keep those characters instead of removing them.

 

* This removes all types of HEX and other characters, and was even smart enough to figure out the HEX end of line character, and to parse the records like it should!!  VERY IMPRESSED!!

 

The first file imported just fine, and this function worked like a dream!

 

Unfortunately I had an importing issue on the second file that I haven't figured out yet.  There seems to be some special character that SAS can't import, and I'm getting this error in my log:

            WARNING: A character that could not be transcoded has been replaced in record 160.
            ERROR: Invalid string.
            FATAL: Unrecoverable I/O error detected in the execution of the DATA step program. Aborted during the EXECUTION phase.
            NOTE: 190 records were read from the infile "/sasem/gbmkuser/search/2Q_2018_Report.csv"

 

I've seen that warning in the first file that imported successfully, so I'm not worried about that part (it's replaced a character I wouldn't have wanted to keep, so no big deal on those).

 

It's the ERROR / FATAL stuff that I need to figure out.

 

Still trying to figure that one out!

 

Thanks everyone for your suggestions!!!

 

Barb

BRKS
Quartz | Level 8

I had to monkey around (copy and paste each one) to get all of these special language characters for another part of the project.

I'll just paste them here, so that anyone who wants these (in addition to the regular alphabet letters) can grab them easily:

 

áéíóúýÁÉÍÓÚÝâêîôûÂÊÎÔÛãñõÃÑÕàèìòùÀÈÌÒÙäëïöüÿÄËÏÖÜŸåÅæÆœŒçÇðÐøØß@

kiranv_
Rhodochrosite | Level 12

In regex can easily helpful here, as it  has character class groupings.

 

[[:^cntrl:]] +   matches a character that is not a control character. you can allow to include anything other control characters

 [[:print:]]+    Visible characters and spaces (anything except control characters)

 

check for more groups in below link

 

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a003288497.htm

 

another link which explains in little detail 

 

https://www.regular-expressions.info/posixbrackets.html

 

BRKS
Quartz | Level 8
Thank you so much, kiranv_ !
FreelanceReinh
Jade | Level 19

@BRKS wrote:

 

I'm using search strings from a public website, and it has a huge variety of special characters (as in non-ascii HEX characters, like arrows) in the strings where I have hundreds of thousands of unique 'values'.

 


I think you'll need to resort to KCOMPRESS for this type of special characters. See Internationalization Compatibility for SAS String Functions.

BRKS
Quartz | Level 8
Thank you so much, FreelanceReinhard !

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 7947 views
  • 2 likes
  • 4 in conversation