BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
changxuosu1
Obsidian | Level 7

Hi everyone, thank you for reading my question!

I have a "cityname" variable whose data look like this:
"Autaugaville town"
"Billingsley town"
"Marbury CDP "
"Millbrook city"
"Pine Level CDP"
"Prattville city"

 

Does anyone know how to code efficiently to transform this variable into the following:

 

"Autaugaville"
"Billingsley"
"Marbury "
"Millbrook"
"Pine Level"
"Prattville"

 

I just want to delete the ending word whenever it's tow/city/CDP/borough/village...

 

Here is my way, just wondering any other more efficient ways...

 

/* city, borough, town, cdp, village, */
data xu1.citycounty;
set xu1.citycounty;
cityname = upcase(cityname);
cityname1 = scan(cityname,1,'');
cityname2 = scan(cityname,2,'');
cityname3 = scan(cityname,3,'');
cityname4 = scan(cityname,4,'');
cityname5 = scan(cityname,5,'');
cityname6 = scan(cityname,6,'');
if cityname2 in( 'CITY','TOWN','BOROUGH', 'VILLAGE','CDP') then cityname2= '';
if cityname3 in( 'CITY','TOWN','BOROUGH', 'VILLAGE','CDP') then cityname3= '';
if cityname4 in( 'CITY','TOWN','BOROUGH', 'VILLAGE','CDP') then cityname4= '';
if cityname5 in( 'CITY','TOWN','BOROUGH', 'VILLAGE','CDP') then cityname5= '';
if cityname6 in( 'CITY','TOWN','BOROUGH', 'VILLAGE','CDP') then cityname6= '';

run;

data xu1.citycounty;
set xu1.citycounty;
citynamenew= trim(cityname1)||''||trimn(cityname2)||''||trimn(cityname3)||''||trimn(cityname4)||''||trimn(cityname5)||''||trimn(cityname6);
run;

 

 

Thank you all!

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20
data have;
input var $40. ;
var=dequote(var);
datalines;
"Autaugaville town"
"Billingsley town"
"Marbury CDP "
"Millbrook city"
"Pine Level CDP"
"Prattville city"
;

data want;
set have;
call scan(var, countw(var), position, length);
substr(var,position,length)=' ';
keep var;
run;

View solution in original post

10 REPLIES 10
PaigeMiller
Diamond | Level 26

Use the SCAN FUNCTION.

 

Of course, such a solution may run into problems if the city name has two words like St. Paul

--
Paige Miller
changxuosu1
Obsidian | Level 7

thank you Paige... I tried scan function it worked!

novinosrin
Tourmaline | Level 20
data have;
input var $40. ;
var=dequote(var);
datalines;
"Autaugaville town"
"Billingsley town"
"Marbury CDP "
"Millbrook city"
"Pine Level CDP"
"Prattville city"
;

data want;
set have;
call scan(var, countw(var), position, length);
substr(var,position,length)=' ';
keep var;
run;
s_lassen
Meteorite | Level 14
Instead of using scan(var, countw(var), position, length), you can use scan(var, -1, position, length). Does the same but much faster and simpler (scans from the end instead of scanning twice from the start).

changxuosu1
Obsidian | Level 7

Your code works perfectly! Thank you so much! 

Astounding
PROC Star

I would try:

 

if upcase(scan(cityname, -1)) in ('CITY', 'TOWN', 'BOROUGH', 'VILLAGE', 'CDP') 

they cityname = substr(cityname, 1, length(cityname) - length(scan(cityname, -1)));

changxuosu1
Obsidian | Level 7

thank you so much. This one takes into the account the ending name may not be in ('CITY', 'TOWN', 'BOROUGH', 'VILLAGE', 'CDP').  Really appreciate your input. 

PGStats
Opal | Level 21

Don't forget the case where there is a single name...  For the greatest flexibility, use regular expressions:

 

data have;
input str $60.;
datalines;
Autaugaville town
Billingsley town
Marbury CDP 
Millbrook city
Pine Level CDP
Prattville city
Somecity
;

data want;
set have;
city = prxChange("s/((\w+\s+)+?)(\w+\s+)$/$1/o",1,str);
run;

proc print; run;
PG
s_lassen
Meteorite | Level 14

I would use PRX, like this:

data want;
  set have;
  var=prxchange('s/\b(town|cdp|borough|city)\s*$//i',1,var);
run;

A short explanation: \b is a word boundary, (town|cdp|borough|city) is any of the nouns in question, \s* is an arbitrary amount of whitespace, and $ means end-of-string. All of which is replaced by nothing. The "i" at the end makes the search disregard letter cases.The "1" parameter means that the replacement is done only once.

Ksharp
Super User
data have;
input var $quote40. ;
want=prxchange('s/\w+$//',1,strip(var));
datalines;
"Autaugaville town"
"Billingsley town"
"Marbury CDP "
"Millbrook city"
"Pine Level CDP"
"Prattville city"
;
proc print;run;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 1551 views
  • 9 likes
  • 7 in conversation