I am trying to extract a specific word from a character string, and it may be written differently. Everything I need to extract contains or starts with "MSP" I need not only the work MSP but also the numbers following (there's over 100 number combinations and I do not have a master list):
Examples of text:
Valve, SQUARE, CAST IRON, 3 IN, THD, BLACK, MSP 54-37, FITTING (need to return "MSP 54-37)
Pipe, SQUARE HEAD, 3 IN, THD, 175 PSIG, BLACK, MSP54-37, FITTING (need to return "MSP 54-37)
PLUG, SQUARE HEAD, CAST IRON, 3 IN, THD, 175 PSIG, BLACK, MSP 54-37.1, FITTING (need to return "MSP 54-37.1)
PLUG, SQUARE HEAD, CAST IRON, 3 IN, STEEL, THD, 175 PSIG, BLACK, MSP 54-37.1, MSP 11-1.1, FITTING (need to return "MSP 54-37.1 & MSP 11-1.1 in separate columns)
does anyone know how to do this? I am using Enterprise Guide but I know I can cut & paste code directly into the window as well.
Thank you!!!!
Here is another approach to achieve this:
data have;
infile datalines truncover;
input text $200.;
datalines;
Valve, SQUARE, CAST IRON, 3 IN, THD, BLACK, MSP 54-37, FITTING
Pipe, SQUARE HEAD, 3 IN, THD, 175 PSIG, BLACK, MSP54-37, FITTING
PLUG, SQUARE HEAD, CAST IRON, 3 IN, THD, 175 PSIG, BLACK, MSP 54-37.1, FITTING
PLUG, SQUARE HEAD, CAST IRON, 3 IN, STEEL, THD, 175 PSIG, BLACK, MSP 54-37.1, MSP 11-1.1, FITTING
;
proc sql noprint;
select max(countw(text,',')), max(count(text,"MSP")) into:nb1,:nb2 from have;
quit;
data want;
set have;
array _MSP (&nb1) $ 200;
array MSP (&nb2) $ 20;
do i=1 to countw(text,',');
if prxmatch('/MSP\s*\d+-*\d+\.*\d*/',scan(text,i,',')) then _MSP(i) = scan(text,i,',');
end;
MSP_extract = catx(',',of _MSP(*));
do j=1 to count(text,"MSP");
MSP(j) = scan(MSP_extract,j,',');
end;
drop i j MSP_extract _:;
run;
I need to understand your input data better. Show it in the form of a data step using CARDS.
Also, show what you WANT.
Test this:
data have;
infile datalines truncover;
input line $200.;
datalines;
Valve, SQUARE, CAST IRON, 3 IN, THD, BLACK, MSP 54-37, FITTING
Pipe, SQUARE HEAD, 3 IN, THD, 175 PSIG, BLACK, MSP54-37, FITTING
PLUG, SQUARE HEAD, CAST IRON, 3 IN, THD, 175 PSIG, BLACK, MSP 54-37.1, FITTING
PLUG, SQUARE HEAD, CAST IRON, 3 IN, STEEL, THD, 175 PSIG, BLACK, MSP 54-37.1, MSP 11-1.1, FITTING
;
data want;
if not prxID then prxID + prxparse("/\b(\w*MSP\w*)\s*([-.0-9 ]+)/i");
set have;
length found $20;
start = 1;
stop = length(line);
call prxnext(prxID, start, stop, line, position, length);
do while (position > 0);
found = catx(" ", prxposn(prxId, 1, line), prxposn(prxId, 2, line));
output;
call prxnext(prxID, start, stop, line, position, length);
end;
drop prxID start stop position length;
run;
proc print data=want noobs; run;
line found Valve, SQUARE, CAST IRON, 3 IN, THD, BLACK, MSP 54-37, FITTING MSP 54-37 Pipe, SQUARE HEAD, 3 IN, THD, 175 PSIG, BLACK, MSP54-37, FITTING MSP54 -37 PLUG, SQUARE HEAD, CAST IRON, 3 IN, THD, 175 PSIG, BLACK, MSP 54-37.1, FITTING MSP 54-37.1 PLUG, SQUARE HEAD, CAST IRON, 3 IN, STEEL, THD, 175 PSIG, BLACK, MSP 54-37.1, MSP 11-1.1, FITTING MSP 54-37.1 PLUG, SQUARE HEAD, CAST IRON, 3 IN, STEEL, THD, 175 PSIG, BLACK, MSP 54-37.1, MSP 11-1.1, FITTING MSP 11-1.1
Thank you so much!!!!
Here is another approach to achieve this:
data have;
infile datalines truncover;
input text $200.;
datalines;
Valve, SQUARE, CAST IRON, 3 IN, THD, BLACK, MSP 54-37, FITTING
Pipe, SQUARE HEAD, 3 IN, THD, 175 PSIG, BLACK, MSP54-37, FITTING
PLUG, SQUARE HEAD, CAST IRON, 3 IN, THD, 175 PSIG, BLACK, MSP 54-37.1, FITTING
PLUG, SQUARE HEAD, CAST IRON, 3 IN, STEEL, THD, 175 PSIG, BLACK, MSP 54-37.1, MSP 11-1.1, FITTING
;
proc sql noprint;
select max(countw(text,',')), max(count(text,"MSP")) into:nb1,:nb2 from have;
quit;
data want;
set have;
array _MSP (&nb1) $ 200;
array MSP (&nb2) $ 20;
do i=1 to countw(text,',');
if prxmatch('/MSP\s*\d+-*\d+\.*\d*/',scan(text,i,',')) then _MSP(i) = scan(text,i,',');
end;
MSP_extract = catx(',',of _MSP(*));
do j=1 to count(text,"MSP");
MSP(j) = scan(MSP_extract,j,',');
end;
drop i j MSP_extract _:;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.