DATA Step, Macro, Functions and more

String Functions

Accepted Solution Solved
Reply
Super Contributor
Posts: 1,041
Accepted Solution

String Functions

Hello,

 

I want to remove certain charecters from the variable.

From the below example I would like to remove the space and the County which comes after that.

 

Example:

 

Irving County

Collin County

Denton County

 

Want:

Irving

Collin

Denton


Accepted Solutions
Solution
‎03-02-2016 02:21 PM
Trusted Advisor
Posts: 1,118

Re: String Functions

Posted in reply to robertrao

Hello @robertrao,

 

How about this?

data have;
input cty $30.;
cards;
Irving County
Collin County
Denton County
Countyard County
Lake of the Woods County
;

data want;
set have;
if propcase(scan(cty, -1, ' '))='County' then cty=substr(cty, 1, length(cty)-6);
run;

(There is a Lake of the Woods County in MN, but I made up Countyard County.)

 

As Astounding mentioned already, if you're going to apply this to a large dataset, you should check for data issues and special cases such as 'Denver, City and County of' etc.

 

Also, in general you won't be able "to remove the space" (within the character variable), unless all names have the same length.

 

View solution in original post


All Replies
Super User
Posts: 19,869

Re: String Functions

Posted in reply to robertrao

SCAN()

Super User
Posts: 5,516

Re: String Functions

SCAN works well for most cases:

 

county = scan(county, 1);

 

But it takes the first word only, which could be a problem for multiple-word counties:

 

Van Buren County

 

(Yes, I made that up.  But there must be multiple-word counties somewhere.)

 

To check whether you need a more complex approach, consider:

 

1. Do you have any multiple-word county names?

2. If so, is the last word always spelled correctly ("County")?

Solution
‎03-02-2016 02:21 PM
Trusted Advisor
Posts: 1,118

Re: String Functions

Posted in reply to robertrao

Hello @robertrao,

 

How about this?

data have;
input cty $30.;
cards;
Irving County
Collin County
Denton County
Countyard County
Lake of the Woods County
;

data want;
set have;
if propcase(scan(cty, -1, ' '))='County' then cty=substr(cty, 1, length(cty)-6);
run;

(There is a Lake of the Woods County in MN, but I made up Countyard County.)

 

As Astounding mentioned already, if you're going to apply this to a large dataset, you should check for data issues and special cases such as 'Denver, City and County of' etc.

 

Also, in general you won't be able "to remove the space" (within the character variable), unless all names have the same length.

 

Super Contributor
Posts: 1,041

Re: String Functions

Posted in reply to FreelanceReinhard

Thank you Everyone,

 

It works fine for all conditions.

 

 

Thanks again

Super Contributor
Posts: 1,041

Re: String Functions

Posted in reply to FreelanceReinhard

I have a question o how this is working:

 

if propcase(scan(cty, -1, ' '))='County' then cty=substr(cty, 1, length(cty)-6);

 

cant i just use: if propcase(scan(cty, -1, ' '))='County' then cty=substr(cty, 1,-6);  why dont this work?????

 

Thanks

Trusted Advisor
Posts: 1,118

Re: String Functions

Posted in reply to robertrao

Well, the syntax of the SUBSTR function specifies that the third argument "is the length of the substring to extract." You want to extract the substring beginning at position 1 and ending either in the blank before "County" or in the character immediately before that blank (the result is effectively the same). The length of this substring is length(cty)-6 or length(cty)-7, respectively. It is definitely not -6, because, obviously, the length of a (sub)string, i.e. a number of characters, cannot be negative.

Super Contributor
Posts: 1,041

Re: String Functions

Posted in reply to FreelanceReinhard

Thanks again but I am a bit confused here.

 

When we are scanning from left to right we just say substr(county,1,6)

here we want it to scan from the right side

As you explained the length cannot be negative but we are still saying length(County)-6 which has -6 as negative

Also i thought the 1 is the one which specifies to go from left to right or

right to left depending on the positive sign or negative sign preceding it

 

 

 

if propcase(scan(County, -1, ' '))='County' then County=upcase(substr(County, 1, length(County)-6));

Trusted Advisor
Posts: 1,118

Re: String Functions

Posted in reply to robertrao

robertrao wrote:

(...)

here we want it to scan from the right side


We may want it to scan from the right side, but as it stands (SAS 9.4) the syntax of the SUBSTR function does not offer this feature (please correct me if I'm wrong).

 


robertrao wrote:

As you explained the length cannot be negative but we are still saying length(County)-6 which has -6 as negative

 

Not -6 is the third argument, but length(cty)-6, where length(cty) evaluates to a number >=6 because of the IF condition: A string containing the word "County" must be at least as long as "County" itself, i.e. 6 characters. Hence the third argument is >=0, not negative.

 


robertrao wrote:

 

Also i thought the 1 is the one which specifies to go from left to right or

right to left depending on the positive sign or negative sign preceding it

 


The second argument of the SUBSTR function (1 in our case) specifies "the beginning character position."

The second argument of the SCAN function (-1 in our case), however, specifies "the number of the word in the character string" and if it "is negative, SCAN counts words from right to left in the character string."

 

So, bottom line is: Syntax details have to be accepted as they are. There is little room for interpretation or wishful thinking.

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 449 views
  • 2 likes
  • 4 in conversation