Hi everyone, thank you for reading my question!
I have a "cityname" variable whose data look like this:
"Autaugaville town"
"Billingsley town"
"Marbury CDP "
"Millbrook city"
"Pine Level CDP"
"Prattville city"
Does anyone know how to code efficiently to transform this variable into the following:
"Autaugaville"
"Billingsley"
"Marbury "
"Millbrook"
"Pine Level"
"Prattville"
I just want to delete the ending word whenever it's tow/city/CDP/borough/village...
Here is my way, just wondering any other more efficient ways...
/* city, borough, town, cdp, village, */
data xu1.citycounty;
set xu1.citycounty;
cityname = upcase(cityname);
cityname1 = scan(cityname,1,'');
cityname2 = scan(cityname,2,'');
cityname3 = scan(cityname,3,'');
cityname4 = scan(cityname,4,'');
cityname5 = scan(cityname,5,'');
cityname6 = scan(cityname,6,'');
if cityname2 in( 'CITY','TOWN','BOROUGH', 'VILLAGE','CDP') then cityname2= '';
if cityname3 in( 'CITY','TOWN','BOROUGH', 'VILLAGE','CDP') then cityname3= '';
if cityname4 in( 'CITY','TOWN','BOROUGH', 'VILLAGE','CDP') then cityname4= '';
if cityname5 in( 'CITY','TOWN','BOROUGH', 'VILLAGE','CDP') then cityname5= '';
if cityname6 in( 'CITY','TOWN','BOROUGH', 'VILLAGE','CDP') then cityname6= '';
run;
data xu1.citycounty;
set xu1.citycounty;
citynamenew= trim(cityname1)||''||trimn(cityname2)||''||trimn(cityname3)||''||trimn(cityname4)||''||trimn(cityname5)||''||trimn(cityname6);
run;
Thank you all!
data have;
input var $40. ;
var=dequote(var);
datalines;
"Autaugaville town"
"Billingsley town"
"Marbury CDP "
"Millbrook city"
"Pine Level CDP"
"Prattville city"
;
data want;
set have;
call scan(var, countw(var), position, length);
substr(var,position,length)=' ';
keep var;
run;Use the SCAN FUNCTION.
Of course, such a solution may run into problems if the city name has two words like St. Paul
thank you Paige... I tried scan function it worked!
data have;
input var $40. ;
var=dequote(var);
datalines;
"Autaugaville town"
"Billingsley town"
"Marbury CDP "
"Millbrook city"
"Pine Level CDP"
"Prattville city"
;
data want;
set have;
call scan(var, countw(var), position, length);
substr(var,position,length)=' ';
keep var;
run;Your code works perfectly! Thank you so much!
I would try:
if upcase(scan(cityname, -1)) in ('CITY', 'TOWN', 'BOROUGH', 'VILLAGE', 'CDP')
they cityname = substr(cityname, 1, length(cityname) - length(scan(cityname, -1)));
thank you so much. This one takes into the account the ending name may not be in ('CITY', 'TOWN', 'BOROUGH', 'VILLAGE', 'CDP'). Really appreciate your input.
Don't forget the case where there is a single name... For the greatest flexibility, use regular expressions:
data have;
input str $60.;
datalines;
Autaugaville town
Billingsley town
Marbury CDP 
Millbrook city
Pine Level CDP
Prattville city
Somecity
;
data want;
set have;
city = prxChange("s/((\w+\s+)+?)(\w+\s+)$/$1/o",1,str);
run;
proc print; run;
I would use PRX, like this:
data want;
  set have;
  var=prxchange('s/\b(town|cdp|borough|city)\s*$//i',1,var);
run;A short explanation: \b is a word boundary, (town|cdp|borough|city) is any of the nouns in question, \s* is an arbitrary amount of whitespace, and $ means end-of-string. All of which is replaced by nothing. The "i" at the end makes the search disregard letter cases.The "1" parameter means that the replacement is done only once.
data have;
input var $quote40. ;
want=prxchange('s/\w+$//',1,strip(var));
datalines;
"Autaugaville town"
"Billingsley town"
"Marbury CDP "
"Millbrook city"
"Pine Level CDP"
"Prattville city"
;
proc print;run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
