BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
robertrao
Quartz | Level 8

Hello,

 

I want to remove certain charecters from the variable.

From the below example I would like to remove the space and the County which comes after that.

 

Example:

 

Irving County

Collin County

Denton County

 

Want:

Irving

Collin

Denton

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @robertrao,

 

How about this?

data have;
input cty $30.;
cards;
Irving County
Collin County
Denton County
Countyard County
Lake of the Woods County
;

data want;
set have;
if propcase(scan(cty, -1, ' '))='County' then cty=substr(cty, 1, length(cty)-6);
run;

(There is a Lake of the Woods County in MN, but I made up Countyard County.)

 

As Astounding mentioned already, if you're going to apply this to a large dataset, you should check for data issues and special cases such as 'Denver, City and County of' etc.

 

Also, in general you won't be able "to remove the space" (within the character variable), unless all names have the same length.

 

View solution in original post

8 REPLIES 8
Reeza
Super User

SCAN()

Astounding
PROC Star

SCAN works well for most cases:

 

county = scan(county, 1);

 

But it takes the first word only, which could be a problem for multiple-word counties:

 

Van Buren County

 

(Yes, I made that up.  But there must be multiple-word counties somewhere.)

 

To check whether you need a more complex approach, consider:

 

1. Do you have any multiple-word county names?

2. If so, is the last word always spelled correctly ("County")?

FreelanceReinh
Jade | Level 19

Hello @robertrao,

 

How about this?

data have;
input cty $30.;
cards;
Irving County
Collin County
Denton County
Countyard County
Lake of the Woods County
;

data want;
set have;
if propcase(scan(cty, -1, ' '))='County' then cty=substr(cty, 1, length(cty)-6);
run;

(There is a Lake of the Woods County in MN, but I made up Countyard County.)

 

As Astounding mentioned already, if you're going to apply this to a large dataset, you should check for data issues and special cases such as 'Denver, City and County of' etc.

 

Also, in general you won't be able "to remove the space" (within the character variable), unless all names have the same length.

 

robertrao
Quartz | Level 8

Thank you Everyone,

 

It works fine for all conditions.

 

 

Thanks again

robertrao
Quartz | Level 8

I have a question o how this is working:

 

if propcase(scan(cty, -1, ' '))='County' then cty=substr(cty, 1, length(cty)-6);

 

cant i just use: if propcase(scan(cty, -1, ' '))='County' then cty=substr(cty, 1,-6);  why dont this work?????

 

Thanks

FreelanceReinh
Jade | Level 19

Well, the syntax of the SUBSTR function specifies that the third argument "is the length of the substring to extract." You want to extract the substring beginning at position 1 and ending either in the blank before "County" or in the character immediately before that blank (the result is effectively the same). The length of this substring is length(cty)-6 or length(cty)-7, respectively. It is definitely not -6, because, obviously, the length of a (sub)string, i.e. a number of characters, cannot be negative.

robertrao
Quartz | Level 8

Thanks again but I am a bit confused here.

 

When we are scanning from left to right we just say substr(county,1,6)

here we want it to scan from the right side

As you explained the length cannot be negative but we are still saying length(County)-6 which has -6 as negative

Also i thought the 1 is the one which specifies to go from left to right or

right to left depending on the positive sign or negative sign preceding it

 

 

 

if propcase(scan(County, -1, ' '))='County' then County=upcase(substr(County, 1, length(County)-6));

FreelanceReinh
Jade | Level 19

@robertrao wrote:

(...)

here we want it to scan from the right side


We may want it to scan from the right side, but as it stands (SAS 9.4) the syntax of the SUBSTR function does not offer this feature (please correct me if I'm wrong).

 


@robertrao wrote:

As you explained the length cannot be negative but we are still saying length(County)-6 which has -6 as negative

 

Not -6 is the third argument, but length(cty)-6, where length(cty) evaluates to a number >=6 because of the IF condition: A string containing the word "County" must be at least as long as "County" itself, i.e. 6 characters. Hence the third argument is >=0, not negative.

 


@robertrao wrote:

 

Also i thought the 1 is the one which specifies to go from left to right or

right to left depending on the positive sign or negative sign preceding it

 


The second argument of the SUBSTR function (1 in our case) specifies "the beginning character position."

The second argument of the SCAN function (-1 in our case), however, specifies "the number of the word in the character string" and if it "is negative, SCAN counts words from right to left in the character string."

 

So, bottom line is: Syntax details have to be accepted as they are. There is little room for interpretation or wishful thinking.

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1439 views
  • 2 likes
  • 4 in conversation