Hi everyone, thank you for reading my question!
I have a "cityname" variable whose data look like this:
"Autaugaville town"
"Billingsley town"
"Marbury CDP "
"Millbrook city"
"Pine Level CDP"
"Prattville city"
Does anyone know how to code efficiently to transform this variable into the following:
"Autaugaville"
"Billingsley"
"Marbury "
"Millbrook"
"Pine Level"
"Prattville"
I just want to delete the ending word whenever it's tow/city/CDP/borough/village...
Here is my way, just wondering any other more efficient ways...
/* city, borough, town, cdp, village, */
data xu1.citycounty;
set xu1.citycounty;
cityname = upcase(cityname);
cityname1 = scan(cityname,1,'');
cityname2 = scan(cityname,2,'');
cityname3 = scan(cityname,3,'');
cityname4 = scan(cityname,4,'');
cityname5 = scan(cityname,5,'');
cityname6 = scan(cityname,6,'');
if cityname2 in( 'CITY','TOWN','BOROUGH', 'VILLAGE','CDP') then cityname2= '';
if cityname3 in( 'CITY','TOWN','BOROUGH', 'VILLAGE','CDP') then cityname3= '';
if cityname4 in( 'CITY','TOWN','BOROUGH', 'VILLAGE','CDP') then cityname4= '';
if cityname5 in( 'CITY','TOWN','BOROUGH', 'VILLAGE','CDP') then cityname5= '';
if cityname6 in( 'CITY','TOWN','BOROUGH', 'VILLAGE','CDP') then cityname6= '';
run;
data xu1.citycounty;
set xu1.citycounty;
citynamenew= trim(cityname1)||''||trimn(cityname2)||''||trimn(cityname3)||''||trimn(cityname4)||''||trimn(cityname5)||''||trimn(cityname6);
run;
Thank you all!
data have;
input var $40. ;
var=dequote(var);
datalines;
"Autaugaville town"
"Billingsley town"
"Marbury CDP "
"Millbrook city"
"Pine Level CDP"
"Prattville city"
;
data want;
set have;
call scan(var, countw(var), position, length);
substr(var,position,length)=' ';
keep var;
run;
Use the SCAN FUNCTION.
Of course, such a solution may run into problems if the city name has two words like St. Paul
thank you Paige... I tried scan function it worked!
data have;
input var $40. ;
var=dequote(var);
datalines;
"Autaugaville town"
"Billingsley town"
"Marbury CDP "
"Millbrook city"
"Pine Level CDP"
"Prattville city"
;
data want;
set have;
call scan(var, countw(var), position, length);
substr(var,position,length)=' ';
keep var;
run;
Your code works perfectly! Thank you so much!
I would try:
if upcase(scan(cityname, -1)) in ('CITY', 'TOWN', 'BOROUGH', 'VILLAGE', 'CDP')
they cityname = substr(cityname, 1, length(cityname) - length(scan(cityname, -1)));
thank you so much. This one takes into the account the ending name may not be in ('CITY', 'TOWN', 'BOROUGH', 'VILLAGE', 'CDP'). Really appreciate your input.
Don't forget the case where there is a single name... For the greatest flexibility, use regular expressions:
data have;
input str $60.;
datalines;
Autaugaville town
Billingsley town
Marbury CDP
Millbrook city
Pine Level CDP
Prattville city
Somecity
;
data want;
set have;
city = prxChange("s/((\w+\s+)+?)(\w+\s+)$/$1/o",1,str);
run;
proc print; run;
I would use PRX, like this:
data want;
set have;
var=prxchange('s/\b(town|cdp|borough|city)\s*$//i',1,var);
run;
A short explanation: \b is a word boundary, (town|cdp|borough|city) is any of the nouns in question, \s* is an arbitrary amount of whitespace, and $ means end-of-string. All of which is replaced by nothing. The "i" at the end makes the search disregard letter cases.The "1" parameter means that the replacement is done only once.
data have;
input var $quote40. ;
want=prxchange('s/\w+$//',1,strip(var));
datalines;
"Autaugaville town"
"Billingsley town"
"Marbury CDP "
"Millbrook city"
"Pine Level CDP"
"Prattville city"
;
proc print;run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.