I've used the format $hex50. to identify any special characters or invisible characters in the string. Value for the string is coming from .xls. Assume string value is "Urology". When I executed the program with $hex50. with the value "Urology", I got following message in the log. How to interpret this code to see if/what special characters in the string?
55726F6C6F67790D0A0D0A
Following code is removing the spaces between the strings. I want to remove special characters (spaces, CR, LF) before and after the string and NOT between the strings
compress(string,'090A0C0D20A0'x))
Look up the ASCII codes. Or just play around yourself with the using the $HEX format on single characters.
You have :
55
72
6F
6C
6F
67
79
0D
0A
0D
0A
The last four are two pairs of Carriage Return and Line Feed characters.
You probably do NOT want those in your actual text data. You could use COMPRESS() to remove them
string = compress(string,'0A0D'x);
Or perhaps use TRANSLATE() to replace them with spaces
string = translate(string,' ','0A0D'x);
@Tom Thank you for the information. I want to remove the special characters (spaces, CR, LF) before and after the string and NOT between the string. Currently following line is removing the spaces between the string which I don't want to do.
compress(string,'090A0C0D20A0'x))
E.g. If the string is 'Banking and Insurance' and I don't want to make it as 'BankingandInsurance' by removing the spaces between the string.
I want to remove the special characters (spaces, CR, LF) before and after the string and NOT between the string. Currently following line is removing the spaces between the string which I don't want to do.
compress(string,'090A0C0D20A0'x))
E.g. If the string is 'Banking and Insurance' and I don't want to make it as 'BankingandInsurance' by removing the spaces between the string.
@David_Billa wrote:
I want to remove the special characters (spaces, CR, LF) before and after the string and NOT between the string. Currently following line is removing the spaces between the string which I don't want to do.
compress(string,'090A0C0D20A0'x))E.g. If the string is 'Banking and Insurance' and I don't want to make it as 'BankingandInsurance' by removing the spaces between the string.
If you don't want to remove the spaces then take the 20 out of your COMPRESS() function all.
Usually I use translate to replace the goofy characters with spaces and use COMPBL() to reduce them to one per break. You can use LEFT() to remove the leading spaces.
left(compbl(translate(string,' ','090A0C0DA0'x)))
No need to use TRIM() , or STRIP(), to remove the trialing spaces because they will just get added back when you store the result into a variable.
@David_Billa wrote:
I want to remove all the special/invisible characters including spaces
before and after the string and NOT between the string.
If you want to use COMPRESS() to remove characters you have to tell it exactly what characters to remove. So if you don't want it to remove spaces then don't tell it to remove them.
If it is easier you can use the K modifier and instead tell it which characters to keep. For example if you only want to keep the normal ASCII characters just use something like:
new_string = compress(old_string,collate(32,126),'k');
32 is 20 in hex which is a space. 126 is 7E in hex which is a ~. All of the chracters from space to ~ are in printable standard 7-bit ASCII characters. Everything else is either a control character like CR or LF or some strange non-ASCII character like accented letters or strange symbols.
If you are having trouble figuring out what characters you have that is what the $HEX format can help with.
The STRIP() function will remove leading and trailing SPACES. It does not remove any other characters. As I explained before there is no need to remove the trailing spaces since they just come back when you assign the value to a variable. SAS uses FIXED LENGTH character variables. So they are always padded with spaces to fill the full storage length. So LEFT() is enough to remove the leading spaces.
Not sure what you mean by "between strings". To you mean between the words in the current character variable? That is why I suggested using the COMPBL() function. That will collapse multiple adjacent spaces into one space, but does not remove the space between the words.
Or are you planning to combine a series of character variables together? If so then explain how you are going to combine them to help on preventing extra spaces being inserted.
Do not use "20"x in the COMPRESS(); remove leading and trailing blanks with STRIP().
SAS automatically addes 20 (spaces to fill out a length of the variable) so that's fine, but otherwise, COMPRESS() seems to work.
data have;
input value $hex50.;
format value $hex50.;
cards;
55726F6C6F67790D0A0D0A
;
run;
proc print data=have;
run;
proc print data=have;
format value $50.;
run;
data want;
set have;
value2=strip(value);
value3=trim(value);
value4 = compress(value, '0D0A'x);
format value2 value3 value4 $hex50.;
run;
proc print data=want;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.