Solved: Identifying and removing part of a string of text

NicolaD · Posted 08-11-2010 05:56 AM

I wasn't sure which forum to post this is, I need code that will work in Information Map Studio and/or Enterprise Guide.

I have a list of area names followed by a code in brackets e.g.

Leeds (212)
Wakefield (213)
Kingston-upon-Hull (215)
N Lincolnshire (216)

...etc

I need to use just the area names without the code, but with my limited SAS knowledge, don't know of any syntax to do this. All the area names have the same format as the examples above and can contain spaces, "&" and " - " so I'm looking for code that will search the string and then strip from the open brackets ( to the end, I think I can then use the STRIP function to remove the trailing blank left at the end

Thanks
Nicola

andreas_lds · Posted 08-11-2010 07:33 AM

Editor's Note: Thanks for the excellent solution. I have added a full code version down below.

Combining index- and substr-function seems to be a good start:
name = substr(name, 1, index(name, '(') -1);

Both functions are well documented:
index: http://support.sas.com/documentation/cdl/en/lrdict/63026/HTML/default/a000212242.htm
substr: http://support.sas.com/documentation/cdl/en/lrdict/63026/HTML/default/a000212264.htm

data one;                                  
input name $1-24;   
cards;
Leeds (212)
Wakefield (213)
Kingston-upon-Hull (215)
N Lincolnshire (216)
;

data two;
   set one;
   length new_name $24;
   new_name=substr(name,1,index(name, '(')-1);
run; 
                                      
proc print;                                
run;

View solution in original post

andreas_lds · Posted 08-11-2010 07:33 AM

Editor's Note: Thanks for the excellent solution. I have added a full code version down below.

Combining index- and substr-function seems to be a good start:
name = substr(name, 1, index(name, '(') -1);

Both functions are well documented:
index: http://support.sas.com/documentation/cdl/en/lrdict/63026/HTML/default/a000212242.htm
substr: http://support.sas.com/documentation/cdl/en/lrdict/63026/HTML/default/a000212264.htm

data one;                                  
input name $1-24;   
cards;
Leeds (212)
Wakefield (213)
Kingston-upon-Hull (215)
N Lincolnshire (216)
;

data two;
   set one;
   length new_name $24;
   new_name=substr(name,1,index(name, '(')-1);
run; 
                                      
proc print;                                
run;

NicolaD · Posted 08-11-2010 08:20 AM

Thanks, I will have a look, I just didn't know what kind of functions I needed to read up on

Peter_C · Posted 08-11-2010 08:35 AM

as well as the solution offered by "andreas_id", NicolaD could use the scan() function, because she wants what comes before the first "("....., like
main_part = scan( whole_string, 1, '(' );
If there might be a "(" within the real main_part, then this approach won't do.
The FIND() function has a "direction of search" feature which may be more helpful.
That blank which comes before the "(number)" provides an excellent marker.
main_part = substr( whole_string, 1, find( trim(whole_string), ' ', -9999) ) ;
where that find function should return the position of the blank before the "(number)". As a demo: the code
data;
whole_string = 'rt(hjk) yui (567) ' ;
main_part = substr( whole_string, 1, find( trim(whole_string), ' ', -9999) ) ;
put (_all_)(/=);
run;
creates the SASlog [pre]43 run;

whole_string=rt(hjk) yui (567)
main_part=rt(hjk) yui
NOTE: The data set WORK.[/pre]

Trailing blanks in a variable are normal/unavoidable when stored, and easily removed when the variable is used.

peterC

Tim_SAS · Posted 08-11-2010 09:10 AM

I like regular expressions, so here's a solution that uses a regexp to solve the problem. The regexp is coded so that an area name that is not followed by a code fails to match. If this is not the desired behavior, use [pre]"/^([^\(]+)/"[/pre] instead.
[pre]
data _null_;
infile datalines truncover;
retain re;

if _N_ = 1 then do;
re = prxparse("/^([^\(]+)\(/");
end;

input areaname $25.;

match = prxmatch(re, areaname);
if match ^= 0 then do;
call prxposn(re, match, start, length);
areaname = substr(areaname, start, length);
put areaname=;
end;
else do;
put 'No match for "' areaname '"';
end;
datalines;
Leeds (212)
Wakefield (213)
Kingston-upon-Hull (215)
N Lincolnshire (216)
Barton-Upon-Humber
;;;;
[/pre]
Yields
[pre]
areaname=Leeds
areaname=Wakefield
areaname=Kingston-upon-Hull
areaname=N Lincolnshire
No match for "Barton-Upon-Humber "
[/pre]

ArtC · Posted 08-11-2010 02:49 PM

The SCAN function should do the trick as well:

[pre]
name = scan(wholename,1,'(');
[/pre]

Identifying and removing part of a string of text

Re: Identifying and removing part of a string of text

Re: Identifying and removing part of a string of text

Re: Identifying and removing part of a string of text

Re: Identifying and removing part of a string of text

Re: Identifying and removing part of a string of text

Re: Identifying and removing part of a string of text

Catch up on SAS Innovate 2026

Identifying and removing part of a string of text

Re: Identifying and removing part of a string of text

Re: Identifying and removing part of a string of text

Re: Identifying and removing part of a string of text

Re: Identifying and removing part of a string of text

Re: Identifying and removing part of a string of text

Re: Identifying and removing part of a string of text

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away