BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
maliksmom2000
Obsidian | Level 7

I am trying to extract a specific word from a character string, and it may be written differently.  Everything I need to extract contains or starts with "MSP" I need not only the work MSP but also the numbers following (there's over 100 number combinations and I do not have a master list):

 

Examples of text:

Valve, SQUARE, CAST IRON, 3 IN, THD, BLACK, MSP 54-37, FITTING (need to return "MSP 54-37)

Pipe, SQUARE HEAD, 3 IN, THD, 175 PSIG, BLACK, MSP54-37, FITTING (need to return "MSP 54-37)

PLUG, SQUARE HEAD, CAST IRON, 3 IN, THD, 175 PSIG, BLACK, MSP 54-37.1, FITTING (need to return "MSP 54-37.1)

PLUG, SQUARE HEAD, CAST IRON, 3 IN, STEEL, THD, 175 PSIG, BLACK, MSP 54-37.1, MSP 11-1.1, FITTING (need to return "MSP 54-37.1 & MSP 11-1.1 in separate columns)

 

does anyone know how to do this?  I am using Enterprise Guide but I know I can cut & paste code directly into the window as well.

 

Thank you!!!!

1 ACCEPTED SOLUTION

Accepted Solutions
ed_sas_member
Meteorite | Level 14

Hi @maliksmom2000 

 

Here is another approach to achieve this:

data have;
infile datalines truncover;
input text $200.;
datalines;
Valve, SQUARE, CAST IRON, 3 IN, THD, BLACK, MSP 54-37, FITTING
Pipe, SQUARE HEAD, 3 IN, THD, 175 PSIG, BLACK, MSP54-37, FITTING
PLUG, SQUARE HEAD, CAST IRON, 3 IN, THD, 175 PSIG, BLACK, MSP 54-37.1, FITTING
PLUG, SQUARE HEAD, CAST IRON, 3 IN, STEEL, THD, 175 PSIG, BLACK, MSP 54-37.1, MSP 11-1.1, FITTING
;

proc sql noprint;
	select max(countw(text,',')), max(count(text,"MSP")) into:nb1,:nb2 from have;
quit;

data want;
	set have;
	array _MSP (&nb1) $ 200;
	array MSP (&nb2) $ 20;
	do i=1 to countw(text,',');
		if prxmatch('/MSP\s*\d+-*\d+\.*\d*/',scan(text,i,',')) then _MSP(i) = scan(text,i,',');
	end;
	MSP_extract = catx(',',of _MSP(*));
	do j=1 to count(text,"MSP");
		MSP(j) = scan(MSP_extract,j,',');
	end;
	drop i j MSP_extract _:;
run;

Capture d’écran 2020-02-25 à 12.01.59.png

 

View solution in original post

4 REPLIES 4
data_null__
Jade | Level 19

I need to understand your input data better.  Show it in the form of a data step using CARDS.

 

Also, show what you WANT.

PGStats
Opal | Level 21

Test this:

 

data have;
infile datalines truncover;
input line $200.;
datalines;
Valve, SQUARE, CAST IRON, 3 IN, THD, BLACK, MSP 54-37, FITTING
Pipe, SQUARE HEAD, 3 IN, THD, 175 PSIG, BLACK, MSP54-37, FITTING
PLUG, SQUARE HEAD, CAST IRON, 3 IN, THD, 175 PSIG, BLACK, MSP 54-37.1, FITTING
PLUG, SQUARE HEAD, CAST IRON, 3 IN, STEEL, THD, 175 PSIG, BLACK, MSP 54-37.1, MSP 11-1.1, FITTING
;

data want;
if not prxID then prxID + prxparse("/\b(\w*MSP\w*)\s*([-.0-9 ]+)/i");
set have;
length found $20;
start = 1;
stop = length(line);
call prxnext(prxID, start, stop, line, position, length);
do while (position > 0);
   found = catx(" ", prxposn(prxId, 1, line), prxposn(prxId, 2, line));
   output;
   call prxnext(prxID, start, stop, line, position, length);
   end;
drop prxID start stop position length;
run;

proc print data=want noobs; run;
line 	found
Valve, SQUARE, CAST IRON, 3 IN, THD, BLACK, MSP 54-37, FITTING 	MSP 54-37
Pipe, SQUARE HEAD, 3 IN, THD, 175 PSIG, BLACK, MSP54-37, FITTING 	MSP54 -37
PLUG, SQUARE HEAD, CAST IRON, 3 IN, THD, 175 PSIG, BLACK, MSP 54-37.1, FITTING 	MSP 54-37.1
PLUG, SQUARE HEAD, CAST IRON, 3 IN, STEEL, THD, 175 PSIG, BLACK, MSP 54-37.1, MSP 11-1.1, FITTING 	MSP 54-37.1
PLUG, SQUARE HEAD, CAST IRON, 3 IN, STEEL, THD, 175 PSIG, BLACK, MSP 54-37.1, MSP 11-1.1, FITTING 	MSP 11-1.1
PG
maliksmom2000
Obsidian | Level 7

Thank you so much!!!!

ed_sas_member
Meteorite | Level 14

Hi @maliksmom2000 

 

Here is another approach to achieve this:

data have;
infile datalines truncover;
input text $200.;
datalines;
Valve, SQUARE, CAST IRON, 3 IN, THD, BLACK, MSP 54-37, FITTING
Pipe, SQUARE HEAD, 3 IN, THD, 175 PSIG, BLACK, MSP54-37, FITTING
PLUG, SQUARE HEAD, CAST IRON, 3 IN, THD, 175 PSIG, BLACK, MSP 54-37.1, FITTING
PLUG, SQUARE HEAD, CAST IRON, 3 IN, STEEL, THD, 175 PSIG, BLACK, MSP 54-37.1, MSP 11-1.1, FITTING
;

proc sql noprint;
	select max(countw(text,',')), max(count(text,"MSP")) into:nb1,:nb2 from have;
quit;

data want;
	set have;
	array _MSP (&nb1) $ 200;
	array MSP (&nb2) $ 20;
	do i=1 to countw(text,',');
		if prxmatch('/MSP\s*\d+-*\d+\.*\d*/',scan(text,i,',')) then _MSP(i) = scan(text,i,',');
	end;
	MSP_extract = catx(',',of _MSP(*));
	do j=1 to count(text,"MSP");
		MSP(j) = scan(MSP_extract,j,',');
	end;
	drop i j MSP_extract _:;
run;

Capture d’écran 2020-02-25 à 12.01.59.png