Dear all,
How can I split the value when they are included in the ( ),[ ],{ },' '," ", or separated by blank but not other functions( especially, '.')?
Thanks ChrisNZ's code, I can split the value when they are included in the ( ),[ ],{ },' '," ", or separated by blank now. the code is
data HAVE;
input NO NAME &:$100.;
infile datalines missover;
datalines;
1 juice<BR>a@pple[footer]
2 juice <BR> apple
3 juice<BODY> 'apple'
4 juice{BODY} apple
5 [BR]juice appl'e
6 <figure> "juice" LTD
run;
data WANT;
set HAVE;
length PAIR MATCH_PAIR MATCH_PAIRS $200 WORD $20;
retain REGEX;
array PAIRS [12] $1 _temporary_ ( '[' ']' '{' '}' '<' '>' '"' '"' "'" "'" '(' ')' ) ;
if _N_=1 then do;
do I=1 to 12 by 2;
MATCH_PAIR = catt('\', PAIRS[I], '(.*)\', PAIRS[I+1]);
MATCH_PAIRS = catx('|', MATCH_PAIRS, MATCH_PAIR);
end;
REGEX = prxparse(catt('s/', MATCH_PAIRS, '/ $1$2$3$4$5$6 /'));
end;
NAME1=prxchange(REGEX, -1, NAME);
do I=1 to countw(NAME1,' ');
WORD=scan(NAME1, I);
output;
end;
keep NO WORD;
run;
However, I am facing a new problem during further processing, which is, the value is also separated by the periods.
for example, for the value
7 M & L PROPERTY & ASS.PLC.
8 MMM L.T.D.F.
9 JJJ LTD.H
I get
NO | NAME | WORD | NAME1 |
7 | M & L PROPERTY & ASS.PLC. | M | M & L PROPERTY & ASS.PLC. |
7 | M & L PROPERTY & ASS.PLC. | L | M & L PROPERTY & ASS.PLC. |
7 | M & L PROPERTY & ASS.PLC. | PROPERTY | M & L PROPERTY & ASS.PLC. |
7 | M & L PROPERTY & ASS.PLC. | ASS | M & L PROPERTY & ASS.PLC. |
7 | M & L PROPERTY & ASS.PLC. | PLC | M & L PROPERTY & ASS.PLC. |
7 | M & L PROPERTY & ASS.PLC. | M & L PROPERTY & ASS.PLC. | |
8 | MMM L.T.D.F. | MMM | MMM L.T.D.F. |
8 | MMM L.T.D.F. | L | MMM L.T.D.F. |
9 | JJJ LTD.H | JJJ | JJJ LTD.H |
9 | JJJ LTD.H | LTD | JJJ LTD.H |
However, I expect to get
NO | NAME | WORD | NAME1 |
7 | M & L PROPERTY & ASS.PLC. | M | M & L PROPERTY & ASS.PLC. |
7 | M & L PROPERTY & ASS.PLC. | L | M & L PROPERTY & ASS.PLC. |
7 | M & L PROPERTY & ASS.PLC. | PROPERTY | M & L PROPERTY & ASS.PLC. |
7 | M & L PROPERTY & ASS.PLC. | ASS.PLC. | M & L PROPERTY & ASS.PLC. |
8 | MMM L.T.D.F. | MMM | MMM L.T.D.F. |
8 | MMM L.T.D.F. | L.T.D.F | MMM L.T.D.F. |
9 | JJJ LTD.H | JJJ | JJJ LTD.H |
9 | JJJ LTD.H | LTD.H | JJJ LTD.H |
Could you please give me some suggestion about this?
thanks in advance.
data HAVE;
input NO NAME &:$100.;
infile datalines missover;
datalines;
1 juice<BR>a@pple[footer]
2 juice <BR> apple
3 juice<BODY> 'apple'
4 juice{BODY} apple
5 [BR]juice appl'e
6 <figure> "juice" LTD
7 M & L PROPERTY & ASS.PLC.
8 MMM L.T.D.F.
9 JJJ LTD.H
run;
I replied and told you to use a space as the third augment of the scan function.
Did you try?
I replied and told you to use a space as the third augment of the scan function.
Did you try?
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.