Help using Base SAS procedures

Identifying and removing part of a string of text

Accepted Solution Solved
Reply
Contributor
Posts: 40
Accepted Solution

Identifying and removing part of a string of text

I wasn't sure which forum to post this is, I need code that will work in Information Map Studio and/or Enterprise Guide.

I have a list of area names followed by a code in brackets e.g.

Leeds (212)
Wakefield (213)
Kingston-upon-Hull (215)
N Lincolnshire (216)

...etc

I need to use just the area names without the code, but with my limited SAS knowledge, don't know of any syntax to do this. All the area names have the same format as the examples above and can contain spaces, "&" and " - " so I'm looking for code that will search the string and then strip from the open brackets ( to the end, I think I can then use the STRIP function to remove the trailing blank left at the end

Thanks
Nicola

Accepted Solutions
Solution
‎09-29-2016 01:19 PM
Super Contributor
Posts: 264

Re: Identifying and removing part of a string of text

[ Edited ]

Editor's Note:  Thanks for the excellent solution.  I have added a full code version down below. 

 

Combining index- and substr-function seems to be a good start:
name = substr(name, 1, index(name, '(') -1);

Both functions are well documented:
index: http://support.sas.com/documentation/cdl/en/lrdict/63026/HTML/default/a000212242.htm
substr: http://support.sas.com/documentation/cdl/en/lrdict/63026/HTML/default/a000212264.htm

 

data one;                                  
input name $1-24;   
cards;
Leeds (212)
Wakefield (213)
Kingston-upon-Hull (215)
N Lincolnshire (216)
;

data two;
   set one;
   length new_name $24;
   new_name=substr(name,1,index(name, '(')-1);
run; 
                                      
proc print;                                
run; 

View solution in original post


All Replies
Solution
‎09-29-2016 01:19 PM
Super Contributor
Posts: 264

Re: Identifying and removing part of a string of text

[ Edited ]

Editor's Note:  Thanks for the excellent solution.  I have added a full code version down below. 

 

Combining index- and substr-function seems to be a good start:
name = substr(name, 1, index(name, '(') -1);

Both functions are well documented:
index: http://support.sas.com/documentation/cdl/en/lrdict/63026/HTML/default/a000212242.htm
substr: http://support.sas.com/documentation/cdl/en/lrdict/63026/HTML/default/a000212264.htm

 

data one;                                  
input name $1-24;   
cards;
Leeds (212)
Wakefield (213)
Kingston-upon-Hull (215)
N Lincolnshire (216)
;

data two;
   set one;
   length new_name $24;
   new_name=substr(name,1,index(name, '(')-1);
run; 
                                      
proc print;                                
run; 
Contributor
Posts: 40

Re: Identifying and removing part of a string of text

Thanks, I will have a look, I just didn't know what kind of functions I needed to read up on
Valued Guide
Posts: 2,175

Re: Identifying and removing part of a string of text

as well as the solution offered by "andreas_id", NicolaD could use the scan() function, because she wants what comes before the first "("....., like
main_part = scan( whole_string, 1, '(' );
If there might be a "(" within the real main_part, then this approach won't do.
The FIND() function has a "direction of search" feature which may be more helpful.
That blank which comes before the "(number)" provides an excellent marker.
main_part = substr( whole_string, 1, find( trim(whole_string), ' ', -9999) ) ;
where that find function should return the position of the blank before the "(number)". As a demo: the code
data;
whole_string = 'rt(hjk) yui (567) ' ;
main_part = substr( whole_string, 1, find( trim(whole_string), ' ', -9999) ) ;
put (_all_)(/=);
run;
creates the SASlog [pre]43 run;

whole_string=rt(hjk) yui (567)
main_part=rt(hjk) yui
NOTE: The data set WORK.[/pre]

Trailing blanks in a variable are normal/unavoidable when stored, and easily removed when the variable is used.

peterC
Super Contributor
Posts: 394

Re: Identifying and removing part of a string of text

I like regular expressions, so here's a solution that uses a regexp to solve the problem. The regexp is coded so that an area name that is not followed by a code fails to match. If this is not the desired behavior, use [pre]"/^([^\(]+)/"[/pre] instead.
[pre]
data _null_;
infile datalines truncover;
retain re;

if _N_ = 1 then do;
re = prxparse("/^([^\(]+)\(/");
end;

input areaname $25.;

match = prxmatch(re, areaname);
if match ^= 0 then do;
call prxposn(re, match, start, length);
areaname = substr(areaname, start, length);
put areaname=;
end;
else do;
put 'No match for "' areaname '"';
end;
datalines;
Leeds (212)
Wakefield (213)
Kingston-upon-Hull (215)
N Lincolnshire (216)
Barton-Upon-Humber
;;;;
[/pre]
Yields
[pre]
areaname=Leeds
areaname=Wakefield
areaname=Kingston-upon-Hull
areaname=N Lincolnshire
No match for "Barton-Upon-Humber "
[/pre]
Valued Guide
Posts: 632

Re: Identifying and removing part of a string of text

The SCAN function should do the trick as well:

[pre]
name = scan(wholename,1,'(');
[/pre]
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 13406 views
  • 0 likes
  • 5 in conversation