I've seen a lot of posts about how to use COMPRESS to remove one or more predefined 'special characters' in their data.
Usually these are normal characters such as an asterisk or a slash mark.
I'm using search strings from a public website, and it has a huge variety of special characters (as in non-ascii HEX characters, like arrows) in the strings where I have hundreds of thousands of unique 'values'.
Is there a way to use the COMPRESS or any other function to 'search and destroy' any special characters without me having to specify them all?
I would like any normal ascii characters in a given string to remain, but to have all special characters stripped out (without me having to define the characters).
Any thoughts? Thanks for your help!
Barb
HI @BRKS Have you looked at modifiers within compress function such as
K- keep
a- chars
compress(string,,'ka'); like that?
HI @BRKS Have you looked at modifiers within compress function such as
K- keep
a- chars
compress(string,,'ka'); like that?
Thank you so much! This ended up being a simple elegant solution! 🙂
For anyone in the future who has the same problem, and might be looking at this post, here is what I finally ended up with:
search_term_lower = compress(search_term_temp, "abcdefghijklmnopqrstuvwxyz0123456789 ,/()'-#!&+*:;<>", 'k');
This is the essential part of the COMPRESS function to do the job:
search_term_lower = COMPRESS(search_term_temp, " ", 'k');
* Put all of the stuff you want to keep between the double quotes above.
Note: If you want to keep spaces, make sure there is a space character between the double quotes too!
* Normally COMPRESS deletes all characters from a string.
It's the last part with the , 'k' that tells SAS to keep those characters instead of removing them.
* This removes all types of HEX and other characters, and was even smart enough to figure out the HEX end of line character, and to parse the records like it should!! VERY IMPRESSED!!
The first file imported just fine, and this function worked like a dream!
Unfortunately I had an importing issue on the second file that I haven't figured out yet. There seems to be some special character that SAS can't import, and I'm getting this error in my log:
WARNING: A character that could not be transcoded has been replaced in record 160.
ERROR: Invalid string.
FATAL: Unrecoverable I/O error detected in the execution of the DATA step program. Aborted during the EXECUTION phase.
NOTE: 190 records were read from the infile "/sasem/gbmkuser/search/2Q_2018_Report.csv"
I've seen that warning in the first file that imported successfully, so I'm not worried about that part (it's replaced a character I wouldn't have wanted to keep, so no big deal on those).
It's the ERROR / FATAL stuff that I need to figure out.
Still trying to figure that one out!
Thanks everyone for your suggestions!!!
Barb
I had to monkey around (copy and paste each one) to get all of these special language characters for another part of the project.
I'll just paste them here, so that anyone who wants these (in addition to the regular alphabet letters) can grab them easily:
áéíóúýÁÉÍÓÚÝâêîôûÂÊÎÔÛãñõÃÑÕàèìòùÀÈÌÒÙäëïöüÿÄËÏÖÜŸåÅæÆœŒçÇðÐøØß@
In regex can easily helpful here, as it has character class groupings.
[[:^cntrl:]] + matches a character that is not a control character. you can allow to include anything other control characters
[[:print:]]+ Visible characters and spaces (anything except control characters)
check for more groups in below link
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a003288497.htm
another link which explains in little detail
https://www.regular-expressions.info/posixbrackets.html
@BRKS wrote:
I'm using search strings from a public website, and it has a huge variety of special characters (as in non-ascii HEX characters, like arrows) in the strings where I have hundreds of thousands of unique 'values'.
I think you'll need to resort to KCOMPRESS for this type of special characters. See Internationalization Compatibility for SAS String Functions.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.