DATA Step, Macro, Functions and more

string variables with several parts, I want to delete certain parts

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 12
Accepted Solution

string variables with several parts, I want to delete certain parts

[ Edited ]

Hi everyone, thank you for reading my question!

I have a "cityname" variable whose data look like this:
"Autaugaville town"
"Billingsley town"
"Marbury CDP "
"Millbrook city"
"Pine Level CDP"
"Prattville city"

 

Does anyone know how to code efficiently to transform this variable into the following:

 

"Autaugaville"
"Billingsley"
"Marbury "
"Millbrook"
"Pine Level"
"Prattville"

 

I just want to delete the ending word whenever it's tow/city/CDP/borough/village...

 

Here is my way, just wondering any other more efficient ways...

 

/* city, borough, town, cdp, village, */
data xu1.citycounty;
set xu1.citycounty;
cityname = upcase(cityname);
cityname1 = scan(cityname,1,'');
cityname2 = scan(cityname,2,'');
cityname3 = scan(cityname,3,'');
cityname4 = scan(cityname,4,'');
cityname5 = scan(cityname,5,'');
cityname6 = scan(cityname,6,'');
if cityname2 in( 'CITY','TOWN','BOROUGH', 'VILLAGE','CDP') then cityname2= '';
if cityname3 in( 'CITY','TOWN','BOROUGH', 'VILLAGE','CDP') then cityname3= '';
if cityname4 in( 'CITY','TOWN','BOROUGH', 'VILLAGE','CDP') then cityname4= '';
if cityname5 in( 'CITY','TOWN','BOROUGH', 'VILLAGE','CDP') then cityname5= '';
if cityname6 in( 'CITY','TOWN','BOROUGH', 'VILLAGE','CDP') then cityname6= '';

run;

data xu1.citycounty;
set xu1.citycounty;
citynamenew= trim(cityname1)||''||trimn(cityname2)||''||trimn(cityname3)||''||trimn(cityname4)||''||trimn(cityname5)||''||trimn(cityname6);
run;

 

 

Thank you all!


Accepted Solutions
Solution
‎04-14-2018 02:42 PM
Super User
Posts: 2,061

Re: string variables with several parts, I want to delete certain parts

Posted in reply to changxuosu1
data have;
input var $40. ;
var=dequote(var);
datalines;
"Autaugaville town"
"Billingsley town"
"Marbury CDP "
"Millbrook city"
"Pine Level CDP"
"Prattville city"
;

data want;
set have;
call scan(var, countw(var), position, length);
substr(var,position,length)=' ';
keep var;
run;

View solution in original post


All Replies
Respected Advisor
Posts: 3,277

Re: string variables with several parts, I want to delete certain parts

Posted in reply to changxuosu1

Use the SCAN FUNCTION.

 

Of course, such a solution may run into problems if the city name has two words like St. Paul

--
Paige Miller
Occasional Contributor
Posts: 12

Re: string variables with several parts, I want to delete certain parts

Posted in reply to PaigeMiller

thank you Paige... I tried scan function it worked!

Solution
‎04-14-2018 02:42 PM
Super User
Posts: 2,061

Re: string variables with several parts, I want to delete certain parts

Posted in reply to changxuosu1
data have;
input var $40. ;
var=dequote(var);
datalines;
"Autaugaville town"
"Billingsley town"
"Marbury CDP "
"Millbrook city"
"Pine Level CDP"
"Prattville city"
;

data want;
set have;
call scan(var, countw(var), position, length);
substr(var,position,length)=' ';
keep var;
run;
PROC Star
Posts: 275

Re: string variables with several parts, I want to delete certain parts

Posted in reply to novinosrin
Instead of using scan(var, countw(var), position, length), you can use scan(var, -1, position, length). Does the same but much faster and simpler (scans from the end instead of scanning twice from the start).

Occasional Contributor
Posts: 12

Re: string variables with several parts, I want to delete certain parts

Posted in reply to novinosrin

Your code works perfectly! Thank you so much! 

Super User
Posts: 6,934

Re: string variables with several parts, I want to delete certain parts

Posted in reply to changxuosu1

I would try:

 

if upcase(scan(cityname, -1)) in ('CITY', 'TOWN', 'BOROUGH', 'VILLAGE', 'CDP') 

they cityname = substr(cityname, 1, length(cityname) - length(scan(cityname, -1)));

Occasional Contributor
Posts: 12

Re: string variables with several parts, I want to delete certain parts

Posted in reply to Astounding

thank you so much. This one takes into the account the ending name may not be in ('CITY', 'TOWN', 'BOROUGH', 'VILLAGE', 'CDP').  Really appreciate your input. 

Esteemed Advisor
Posts: 5,625

Re: string variables with several parts, I want to delete certain parts

Posted in reply to changxuosu1

Don't forget the case where there is a single name...  For the greatest flexibility, use regular expressions:

 

data have;
input str $60.;
datalines;
Autaugaville town
Billingsley town
Marbury CDP 
Millbrook city
Pine Level CDP
Prattville city
Somecity
;

data want;
set have;
city = prxChange("s/((\w+\s+)+?)(\w+\s+)$/$1/o",1,str);
run;

proc print; run;
PG
PROC Star
Posts: 275

Re: string variables with several parts, I want to delete certain parts

Posted in reply to changxuosu1

I would use PRX, like this:

data want;
  set have;
  var=prxchange('s/\b(town|cdp|borough|city)\s*$//i',1,var);
run;

A short explanation: \b is a word boundary, (town|cdp|borough|city) is any of the nouns in question, \s* is an arbitrary amount of whitespace, and $ means end-of-string. All of which is replaced by nothing. The "i" at the end makes the search disregard letter cases.The "1" parameter means that the replacement is done only once.

Super User
Posts: 10,850

Re: string variables with several parts, I want to delete certain parts

Posted in reply to changxuosu1
data have;
input var $quote40. ;
want=prxchange('s/\w+$//',1,strip(var));
datalines;
"Autaugaville town"
"Billingsley town"
"Marbury CDP "
"Millbrook city"
"Pine Level CDP"
"Prattville city"
;
proc print;run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 245 views
  • 9 likes
  • 7 in conversation