- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have a text string variable tha reads:
"PLEASE LOAD TABLE 1394, SAME RATE AS TABLE 276."
Like this variable, there are other instances of unstructured data in this field.
My working thought is to create keywords for the column to extract from the text.
In this example, TABLE is the keyword. I would like to extract "Table 1394" into one variable and "Table 276" into another.
I've worked with Index, Scan, and find with varied results, but I am finding myself getting further away from my desired result. 😞
if find(lowcase(txtstring),'table') >0 then table='table';
do i=1 to countw(txtstring," ");
if scan(lowcase(txtstring),i," ")=table then do;
after=strip(scan(txtstring,i+1," "));
end;
end;
output; end;
this produces only the first iteration of table:
table after
table 276
Any guidance is greatly appreciated.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Like this?
data HAVE;
TEXT="PLEASE LOAD TABLE 1394, SAME RATE AS TABLE 276. "; output;
TEXT="PLEASE LOAD TABLE 139, SAME table 2 AS TABLE 27."; output;
data WANT;
set HAVE;
if _N_=1 then REGEX+prxparse('/(TABLE \d*)/i');
START = 1;
STOP = length(TEXT);
call prxnext(REGEX, START, STOP, TEXT, POS, LEN);
do while (POS);
STR = substr(TEXT,POS,LEN);
output;
call prxnext(REGEX, START, STOP, TEXT, POS, LEN);
end;
keep STR;
run;
proc print data=WANT;
run;
Obs | STR |
---|---|
1 | TABLE 1394 |
2 | TABLE 276 |
3 | TABLE 139 |
4 | table 2 |
5 | TABLE 27 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Like this?
data HAVE;
TEXT="PLEASE LOAD TABLE 1394, SAME RATE AS TABLE 276. "; output;
TEXT="PLEASE LOAD TABLE 139, SAME table 2 AS TABLE 27."; output;
data WANT;
set HAVE;
if _N_=1 then REGEX+prxparse('/(TABLE \d*)/i');
START = 1;
STOP = length(TEXT);
call prxnext(REGEX, START, STOP, TEXT, POS, LEN);
do while (POS);
STR = substr(TEXT,POS,LEN);
output;
call prxnext(REGEX, START, STOP, TEXT, POS, LEN);
end;
keep STR;
run;
proc print data=WANT;
run;
Obs | STR |
---|---|
1 | TABLE 1394 |
2 | TABLE 276 |
3 | TABLE 139 |
4 | table 2 |
5 | TABLE 27 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am thinking about putting this in a do loop for as many keywords as I have (presently 6) this way if any of the words populate in each iteration it will create a counter column per keyword.
Thanks for starting me off.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You needn't do a DO loop.
If say you want to also capture CHAIR 99, you can use:
REGEX+prxparse('/(TABLE \d*|CHAIR \d*)/i');
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content