Dear all,
when I find all strings between (),[],and {} (such as <BR>, [FONT],{BODY},'A',"JUICE") and split them in a new variable
(for example, for the 'JUICE<BR>apple<footer>',I expect to add a blank between 'JUICE' and 'apple')
However,
the value
HARDY(FRNS.)'A'
cannot be processed by the code
I expect to get
name | COMPANY_NAME_inB | COMPANY_NAME_noB |
HARDY(FRNS.)'A' | FRNS. | HARDY |
HARDY(FRNS.)'A' | A | HARDY |
However, I only get
name | COMPANY_NAME_inB | COMPANY_NAME_noB |
HARDY(FRNS.)'A' | A | HARDY(FRNS.) |
Could you please give me some suggestions?
data have ;
infile datalines truncover;
input name $100.;
datalines;
JUICE<BR>apple[footer]
juice <BR> apple
juice<BODY> 'apple'
<figure> "juice" LTD
HARDY(FRNS.)'A'
HAFSLUND 'B' (XSQ)
;
data want;
set have;
RegExID = prxparse('/<\w*>|\(\w*\)|\[\w*\]|\(\w*\)|"\w*"|''\w*''/');
start=1;
stop=length(name);
call prxnext(RegExID, start, stop, name, pos, length);
do while (pos > 0);
COMPANY_NAME_inB = substr(name, pos+1, length-2);
COMPANY_NAME_noB = prxchange('s/<\w*>|\(\w*\)|\[\w*\]|\(\w*\)|"\w*"|''\w*''/ /', -1, name);
output;
call prxnext(RegExID, start, stop, name, pos, length);
end;
drop RegExID pos length start stop;
run;
proc print data=want;
run;
Like this?
data WANT;
set HAVE;
RegExID = prxparse('/<[^>]*>|\([^\)]*\)|\[[^\]]*\]|"[^"]*"|''[^'']*''/');
START=1;
STOP=length(NAME);
call prxnext(RegExID, START, STOP, NAME, POS, LENGTH);
do while (POS > 0);
COMPANY_NAME_inB = substr(NAME, POS+1, LENGTH-2);
COMPANY_NAME_noB = prxchange('s/<[^>]*>|\([^\)]*\)|\[[^\]]*\]|"[^"]*"|''[^'']*''/ /', -1, NAME);
output;
call prxnext(RegExID, START, STOP, NAME, POS, LENGTH);
end;
drop RegExID POS LENGTH START STOP;
run;
Obs | name | COMPANY_NAME_inB | COMPANY_NAME_noB |
---|---|---|---|
1 | JUICE<BR>apple[footer] | BR | JUICE apple |
2 | JUICE<BR>apple[footer] | footer | JUICE apple |
3 | juice <BR> apple | BR | juice apple |
4 | juice<BODY> 'apple' | BODY | juice |
5 | juice<BODY> 'apple' | apple | juice |
6 | <figure> "juice" LTD | figure | LTD |
7 | <figure> "juice" LTD | juice | LTD |
8 | HARDY(FRNS.)'A' | FRNS. | HARDY |
9 | HARDY(FRNS.)'A' | A | HARDY |
10 | HAFSLUND 'B' (XSQ) | B | HAFSLUND |
11 | HAFSLUND 'B' (XSQ) | XSQ | HAFSLUND |
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.