Hi Everyone,
I try both Compress and strip and still fail to remove a blank at the end of a string.
That's weird since compress should remove them all.
The 2nd Timothy as a space at the end.
Can anyone have a look at my problem?
Thank you.
HC
data have;
input name $20.;
datalines;
Od, Timothy
Od, Timothy
;run;
data want; set have;
new=compress(name);
new1=strip(name);
run;
@hhchenfx wrote:
The problem will be when I merge 1 file have the name of "od, timothy " and other has "od, timothy".
I try to remove all space but it doesn't work and therefore the merge failed.
Look like in this above file, the space is a "character", not something SAS auto filled blank.
If you go to SAS file and highlight that cell, you will see the "blue" cover 01 space after "timothy", which show the space.
In other row, if you highlight the celll, the "blue" cover only character.
Thank you
HC
Did you actually try to merge those two values? SAS knows how to deal with spaces appended to the end of the variables. So in SAS if you compare 'TIM' to 'TIM ' they will match. If your values are NOT matching then you do NOT have a space after the 'y' in one of the variables. Instead you have some other character code that does not display any "ink" on the page.
Look at the hex codes for the character(s) that you are having trouble with. You can use the $HEX format to display the codes as hexadecimal values. Some common codes that you could have might be a tab '09'x, carriage return '0d'x, what Microsoft calls a non-breaking space 'A0'x, a null character '00'x.
Once you find out what character it is you can use the COMPRESS() function to remove it. Or use the TRANSLATE() or TRANWRD() functions to replace it with something else.
name = compress(name,'00090A0DA0'x);
The functions really work. However, they require defining a new variable, with a length for the new variable. When COMPRESS ceates a new variable, it assigns the variable a length of $200 (going by memory here). When STRIP defines a new variable, it assigns the variable the same length as the old variable (going by what makes sense here). So by storing the value returned by the function(s), the software has to add characters back on.
Each variable has only one length. You could code something like this:
data want;
length name $ 19;
set have;
run;
That reduces the length of NAME from $20 to $19. But that affects all observations, and may truncate a nonblank from the end of NAME.
I have long list of name with variable length so I can't just "cut" the length.
Thus the blank at the end cause me trouble
If I understand what you meant, this code below still not work.
data have;
input name $40.;
datalines;
Od, Timothy
Od, Timothy
;run;
data want; set have;
length new $ 200;
new=compress(name);
run;
Every variable is defined with one, and only one, length. If you run a PROC CONTENTS, you can see that length. You might be able to change the length, but there is still only one length allowed for a variable. The length does not change from one observation to the next.
What do you want to do with the variable, where the trailing blank gets in the way?
What I meant is the list of name could be
steve, Isber
Kelly, Chen
Jobby, Timothy
So the length of the string is not the same from row to row.
Therefore, the length of name variable is set at $200.
So if the space appear on "steve, Isber ", changing length down to $199 will not work.
This space at the end is so random, I don't know where are they so basically, cutting the length might cut the character.
If the length is $200, that is the number of characters used on EVERY observation. You do not have a single blank at the end of "steve, Isber". Instead, you have 188 blanks. It is not random, it is that EVERY observation is using 200 characters.
If you spell out some details about what you are trying to accomplish, there may be a way for the program to use 12 characters instead of 13 or 200 characters. You you will need to show what you are trying to accomplish. For my part, I won't be able to add more for about another 8 hours.
So I make the length of exact 19 as the longest string and I have no way to remove the space after the Timothy in this below data.
in the cut1 with format cut1 $18., stevenson, Isberger loss the last "r"
I try it all.
data have;
input name $19.;
datalines;
Od, Timothy
stevenson, Isberger
Swan, Ng
;run;
proc contents data=have; run;
data want; set have;
new="@"||compress(name," " )||"@";
new1="@"||strip(name)||"@";
new2="@"||trim(name)||"@";
format cut1 $18.;
cut1=name;
run;
What issue is this causing?
SAS creates variables with a fixed length and pads the remaining with spaces. There is no way to change that.
So you need to work around it, and how you do that depends on the issue you're facing.
The problem will be when I merge 1 file have the name of "od, timothy " and other has "od, timothy".
I try to remove all space but it doesn't work and therefore the merge failed.
Look like in this above file, the space is a "character", not something SAS auto filled blank.
If you go to SAS file and highlight that cell, you will see the "blue" cover 01 space after "timothy", which show the space.
In other row, if you highlight the celll, the "blue" cover only character.
Thank you
HC
@hhchenfx wrote:
The problem will be when I merge 1 file have the name of "od, timothy " and other has "od, timothy".
I try to remove all space but it doesn't work and therefore the merge failed.
Look like in this above file, the space is a "character", not something SAS auto filled blank.
If you go to SAS file and highlight that cell, you will see the "blue" cover 01 space after "timothy", which show the space.
In other row, if you highlight the celll, the "blue" cover only character.
Thank you
HC
Did you actually try to merge those two values? SAS knows how to deal with spaces appended to the end of the variables. So in SAS if you compare 'TIM' to 'TIM ' they will match. If your values are NOT matching then you do NOT have a space after the 'y' in one of the variables. Instead you have some other character code that does not display any "ink" on the page.
Look at the hex codes for the character(s) that you are having trouble with. You can use the $HEX format to display the codes as hexadecimal values. Some common codes that you could have might be a tab '09'x, carriage return '0d'x, what Microsoft calls a non-breaking space 'A0'x, a null character '00'x.
Once you find out what character it is you can use the COMPRESS() function to remove it. Or use the TRANSLATE() or TRANWRD() functions to replace it with something else.
name = compress(name,'00090A0DA0'x);
If you have a character variable with questionable content, do this:
- define a new variable with double the length
- use newvar = put(oldvar,$hexN.); where N = length of new variable.
Now you can inspect the hex codes for "funny" characters which you then can filter using @Tom's example.
Maybe you have some special blank at the end of variable.
want=prxchange('s/\s+$//',-1,name);
Thank you for your help.
I try to upload the SAS datafile here so you can see but there is error with the upload.
I will comeback to this issue later but I really appreciate your help.
HC
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.