I have a character variable that contains alphabets, numbers, some symbols, and both ' and " and also some strange characters (characters that are not on any keyboard). I want to create a new variable with only the characters I want. Because I am using " before and after my list, I cannot write " to keep it in my observations for the variable var. How may I include "?
Thanks!
data want;
set have;
new_var=compress(var,"ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890$%.-':|&[]{}*!@#()!/+,");
run;
To include the character used to quote a string literal just double it up in the string content. Example:
word = 'Don''t';
So if you are using " on the outside to include it as one of the characters just add "" to the list.
compress(var,"ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890$%.-""':|&[]{}*!@#()!/+,")
Hi @Emma_at_SAS
It might be easier to use the "k" modifier. It reverses the compress, so only the specifies characters are kept indtead of dropped.
Drop all printable and non-printable chars except "ABC":
compress(str,'ABC','k');
Drop all printable and non-printable chars except digits, english chars and underscore. There are no characters specified, but the mentioned chars are added with the 'n' modifier. More examples in the online documentation:
compress(str,,'nk');
Thank you, @Tom and @ErikLund_Jensen. A combination of your suggestions solved my problem
I used the double "" and,"k" modifier and that gives me exactly what I want.
Thanks!
May someone please help me to mark this conversation as solved? I am not sure because it was solved based on two suggestions.
Thanks
How about just-
compress(var,,'p');
Hi @Emma_at_SAS,
From the documentation of the scan() function, when the modifier argument is set to "p", as in @novinosrin's post, punctuation is added to the list of characters that are to be removed from the text:
p or P adds punctuation marks to the list of characters.
Kind regards,
Amir.
data test;
input var$ ;
datalines;
oÔÇÖ\[}s
dsd_'"sf
;
run;
Obs | var |
---|---|
1 | oÔÇÖ\[}s |
2 | dsd_'"sf |
data want;
set test;
new_var=compress(var,"ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890$%.-'"":|&[]{}*!@#()!/+, ",'kg');
run;
proc print data=want; run;
The SAS System |
Obs | var | new_var |
---|---|---|
1 | oÔÇÖ\[}s | oÔÇÖ\[}s |
2 | dsd_'"sf | dsd_'"sf |
Thank you @Ksharp for your suggestion. I already have a solution but I thought this might be easier to just remove the "graphic characters". Beacuse I want to keep everything except the translation of emojis that appear in my SAS dataset. May you please help me with an example? Thanks
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.