Hello,
I want to remove certain charecters from the variable.
From the below example I would like to remove the space and the County which comes after that.
Example:
Irving County
Collin County
Denton County
Want:
Irving
Collin
Denton
Hello @robertrao,
How about this?
data have;
input cty $30.;
cards;
Irving County
Collin County
Denton County
Countyard County
Lake of the Woods County
;
data want;
set have;
if propcase(scan(cty, -1, ' '))='County' then cty=substr(cty, 1, length(cty)-6);
run;
(There is a Lake of the Woods County in MN, but I made up Countyard County.)
As Astounding mentioned already, if you're going to apply this to a large dataset, you should check for data issues and special cases such as 'Denver, City and County of' etc.
Also, in general you won't be able "to remove the space" (within the character variable), unless all names have the same length.
SCAN()
SCAN works well for most cases:
county = scan(county, 1);
But it takes the first word only, which could be a problem for multiple-word counties:
Van Buren County
(Yes, I made that up. But there must be multiple-word counties somewhere.)
To check whether you need a more complex approach, consider:
1. Do you have any multiple-word county names?
2. If so, is the last word always spelled correctly ("County")?
Hello @robertrao,
How about this?
data have;
input cty $30.;
cards;
Irving County
Collin County
Denton County
Countyard County
Lake of the Woods County
;
data want;
set have;
if propcase(scan(cty, -1, ' '))='County' then cty=substr(cty, 1, length(cty)-6);
run;
(There is a Lake of the Woods County in MN, but I made up Countyard County.)
As Astounding mentioned already, if you're going to apply this to a large dataset, you should check for data issues and special cases such as 'Denver, City and County of' etc.
Also, in general you won't be able "to remove the space" (within the character variable), unless all names have the same length.
Thank you Everyone,
It works fine for all conditions.
Thanks again
I have a question o how this is working:
if propcase(scan(cty, -1, ' '))='County' then cty=substr(cty, 1, length(cty)-6);
cant i just use: if propcase(scan(cty, -1, ' '))='County' then cty=substr(cty, 1,-6); why dont this work?????
Thanks
Well, the syntax of the SUBSTR function specifies that the third argument "is the length of the substring to extract." You want to extract the substring beginning at position 1 and ending either in the blank before "County" or in the character immediately before that blank (the result is effectively the same). The length of this substring is length(cty)-6 or length(cty)-7, respectively. It is definitely not -6, because, obviously, the length of a (sub)string, i.e. a number of characters, cannot be negative.
Thanks again but I am a bit confused here.
When we are scanning from left to right we just say substr(county,1,6)
here we want it to scan from the right side
As you explained the length cannot be negative but we are still saying length(County)-6 which has -6 as negative
Also i thought the 1 is the one which specifies to go from left to right or
right to left depending on the positive sign or negative sign preceding it
if propcase(scan(County, -1, ' '))='County' then County=upcase(substr(County, 1, length(County)-6));
@robertrao wrote:
(...)
here we want it to scan from the right side
We may want it to scan from the right side, but as it stands (SAS 9.4) the syntax of the SUBSTR function does not offer this feature (please correct me if I'm wrong).
@robertrao wrote:
As you explained the length cannot be negative but we are still saying length(County)-6 which has -6 as negative
Not -6 is the third argument, but length(cty)-6, where length(cty) evaluates to a number >=6 because of the IF condition: A string containing the word "County" must be at least as long as "County" itself, i.e. 6 characters. Hence the third argument is >=0, not negative.
@robertrao wrote:
Also i thought the 1 is the one which specifies to go from left to right or
right to left depending on the positive sign or negative sign preceding it
The second argument of the SUBSTR function (1 in our case) specifies "the beginning character position."
The second argument of the SCAN function (-1 in our case), however, specifies "the number of the word in the character string" and if it "is negative, SCAN counts words from right to left in the character string."
So, bottom line is: Syntax details have to be accepted as they are. There is little room for interpretation or wishful thinking.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.