Dear all,
How can I split the value when they are included in the ( ),[ ],{ },' '," ", or separated by blank but not other functions( especially, '.')?
Thanks ChrisNZ's code, I can split the value when they are included in the ( ),[ ],{ },' '," ", or separated by blank now. the code is
data HAVE;
input NO NAME &:$100.;
infile datalines missover;
datalines;
1 juice<BR>a@pple[footer]
2 juice <BR> apple
3 juice<BODY> 'apple'
4 juice{BODY} apple
5 [BR]juice appl'e
6 <figure> "juice" LTD
run;
data WANT;
set HAVE;
length PAIR MATCH_PAIR MATCH_PAIRS $200 WORD $20;
retain REGEX;
array PAIRS [12] $1 _temporary_ ( '[' ']' '{' '}' '<' '>' '"' '"' "'" "'" '(' ')' ) ;
if _N_=1 then do;
do I=1 to 12 by 2;
MATCH_PAIR = catt('\', PAIRS[I], '(.*)\', PAIRS[I+1]);
MATCH_PAIRS = catx('|', MATCH_PAIRS, MATCH_PAIR);
end;
REGEX = prxparse(catt('s/', MATCH_PAIRS, '/ $1$2$3$4$5$6 /'));
end;
NAME1=prxchange(REGEX, -1, NAME);
do I=1 to countw(NAME1,' ');
WORD=scan(NAME1, I);
output;
end;
keep NO WORD;
run;
However, I am facing a new problem during further processing, which is, the value is also separated by the periods.
for example, for the value
7 M & L PROPERTY & ASS.PLC.
8 MMM L.T.D.F.
9 JJJ LTD.H
I get
NO | NAME | WORD | NAME1 |
7 | M & L PROPERTY & ASS.PLC. | M | M & L PROPERTY & ASS.PLC. |
7 | M & L PROPERTY & ASS.PLC. | L | M & L PROPERTY & ASS.PLC. |
7 | M & L PROPERTY & ASS.PLC. | PROPERTY | M & L PROPERTY & ASS.PLC. |
7 | M & L PROPERTY & ASS.PLC. | ASS | M & L PROPERTY & ASS.PLC. |
7 | M & L PROPERTY & ASS.PLC. | PLC | M & L PROPERTY & ASS.PLC. |
7 | M & L PROPERTY & ASS.PLC. | M & L PROPERTY & ASS.PLC. | |
8 | MMM L.T.D.F. | MMM | MMM L.T.D.F. |
8 | MMM L.T.D.F. | L | MMM L.T.D.F. |
9 | JJJ LTD.H | JJJ | JJJ LTD.H |
9 | JJJ LTD.H | LTD | JJJ LTD.H |
However, I expect to get
NO | NAME | WORD | NAME1 |
7 | M & L PROPERTY & ASS.PLC. | M | M & L PROPERTY & ASS.PLC. |
7 | M & L PROPERTY & ASS.PLC. | L | M & L PROPERTY & ASS.PLC. |
7 | M & L PROPERTY & ASS.PLC. | PROPERTY | M & L PROPERTY & ASS.PLC. |
7 | M & L PROPERTY & ASS.PLC. | ASS.PLC. | M & L PROPERTY & ASS.PLC. |
8 | MMM L.T.D.F. | MMM | MMM L.T.D.F. |
8 | MMM L.T.D.F. | L.T.D.F | MMM L.T.D.F. |
9 | JJJ LTD.H | JJJ | JJJ LTD.H |
9 | JJJ LTD.H | LTD.H | JJJ LTD.H |
Could you please give me some suggestion about this?
thanks in advance.
data HAVE;
input NO NAME &:$100.;
infile datalines missover;
datalines;
1 juice<BR>a@pple[footer]
2 juice <BR> apple
3 juice<BODY> 'apple'
4 juice{BODY} apple
5 [BR]juice appl'e
6 <figure> "juice" LTD
7 M & L PROPERTY & ASS.PLC.
8 MMM L.T.D.F.
9 JJJ LTD.H
run;
I replied and told you to use a space as the third augment of the scan function.
Did you try?
I replied and told you to use a space as the third augment of the scan function.
Did you try?
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.