Hi All,
i am having a text embedded in square brackets with multiple delimiters in the text . Text in square brackets are considered as variable names. I want to extract these variable names and store in another variable. In some cases the square brackets itself act as delimiter.I have tried this using SCAN and COUNTW, but countw is counting each '[' separately but not combinely as single '[]' into one,hence the count is becoming many and loop is executing multiple times. Any suggestions ,may be i am missing the basic logic here.
data have;
length have $100. want $50.;
have='[ONE] "/" [TWO] "/(" [THREE] "-" [FOUR] ")\" [FIVE]';
want="ONE,TWO,THREE,FOUR,FIVE";OUTPUT;
have='[SIX] "(" [SEVEN] ")" [EIGHT] "-" [NINE] "--" ';
want="SIX,SEVEN,EIGHT,NINE";OUTPUT;
have='[ONE] "#" [TWO] "have" [THREE] "$" [NINE] "with" ';
want="ONE,TWO,THREE,NINE";output;
have=' "combine" [TWO] "with" [THREE] "[" [NINE] "]" ';
want="TWO,THREE,NINE";OUTPUT;
run ;
Another way:
WANT=compress(transtrn(prxchange('s/( *|"[^"]*")//' ,-1,HAVE),'][',','),'[]');
put WANT=;
WANT=ONE,TWO,THREE,FOUR,FIVE
WANT=SIX,SEVEN,EIGHT,NINE
WANT=ONE,TWO,THREE,NINE
WANT=TWO,THREE,NINE
That's a bit tricky, but it's just text, and it can be parsed. I added a little sub-routine to your DATA step. It puts out two lines for each "have" data that you start with. The first of the two is "Want" -- this is what you want to end up with. The second is "Got" which is what the sub-routine is actually producing. Want and Got should be identical if the sub-routine is working properly.
Here's the code:
data have;
DROP have;
LENGTH Status $6;
length have $100. want $50.;
have='[ONE] "/" [TWO] "/(" [THREE] "-" [FOUR] ")\" [FIVE]';
want="ONE,TWO,THREE,FOUR,FIVE"; Status='Want'; OUTPUT;
LINK Transform_Data;
have='[SIX] "(" [SEVEN] ")" [EIGHT] "-" [NINE] "--" ';
want="SIX,SEVEN,EIGHT,NINE";Status='Want'; OUTPUT;
LINK Transform_Data;
have='[ONE] "#" [TWO] "have" [THREE] "$" [NINE] "with" ';
want="ONE,TWO,THREE,NINE";Status='Want'; output;
LINK Transform_Data;
have=' "combine" [TWO] "with" [THREE] "[" [NINE] "]" ';
want="TWO,THREE,NINE";Status='Want'; OUTPUT;
LINK Transform_Data;
RETURN;
Transform_Data:
Want=COMPRESS(TRANWRD(TRANWRD(TRANWRD(TRANSLATE(TRANWRD(COMPRESS(TRANWRD(TRANWRD(TRANWRD(have, '/(', '/'),')\','\'),'--',''),']" '),'[[',','),',,,,,,,','/\-()#$'),'with',','),'have',','),'combine',''),'[ ');
IF SUBSTR(Want, LENGTH(Want), 1) = ',' THEN
WANT = SUBSTR(Want, 1, (LENGTH(Want) - 1));
Status='Got';
OUTPUT;
RETURN;
run;
And below are the results. The "Want" and the "Got" are in fact identical, so the sub-routine is confirmed to be working correctly.
Obs Status want 1 Want ONE,TWO,THREE,FOUR,FIVE 2 Got ONE,TWO,THREE,FOUR,FIVE 3 Want SIX,SEVEN,EIGHT,NINE 4 Got SIX,SEVEN,EIGHT,NINE 5 Want ONE,TWO,THREE,NINE 6 Got ONE,TWO,THREE,NINE 7 Want TWO,THREE,NINE 8 Got TWO,THREE,NINE
Hope that helps,
Jim
Another way:
WANT=compress(transtrn(prxchange('s/( *|"[^"]*")//' ,-1,HAVE),'][',','),'[]');
put WANT=;
WANT=ONE,TWO,THREE,FOUR,FIVE
WANT=SIX,SEVEN,EIGHT,NINE
WANT=ONE,TWO,THREE,NINE
WANT=TWO,THREE,NINE
A slightly better one-liner:
WANT=compress(transtrn(prxchange('s/.*?(\[\w*\])[^\[]*/\1/' ,-1,HAVE),'][',','),'[]');
ONE,TWO,THREE,FOUR,FIVE
SIX,SEVEN,EIGHT,NINE
ONE,TWO,THREE,NINE
TWO,THREE,NINE
data have;
length have $100.;
have='[ONE] "/" [TWO] "/(" [THREE] "-" [FOUR] ")\" [FIVE]'; output;
have='[SIX] "(" [SEVEN] ")" [EIGHT] "-" [NINE] "--" '; output;
have='[ONE] "#" [TWO] "have" [THREE] "$" [NINE] "with" '; output;
have=' "combine" [TWO] "with" [THREE] "[" [NINE] "]" '; output;
run ;
data want;
set have;
length want $ 100;
pid=prxparse('/\[\w+?\]/');
s=1;e=length(have);
call prxnext(pid,s,e,have,p,l);
do while(p>0);
want=catx(',',want,compress(substr(have,p,l),'[]'));
call prxnext(pid,s,e,have,p,l);
end;
drop pid s e p l;
run;
Here a variation to what @Ksharp proposed.
data have;
length have $100.;
have='[ONE] "/" [TWO] "/(" [THREE] "-" [FOUR] ")\" [FIVE]';
output;
have='[SIX] "(" [SEVEN] ")" [EIGHT] "-" [NINE] "--" ';
output;
have='[ONE] "#" [TWO] "have" [THREE] "$" [NINE] "with" ';
output;
have=' "combine" [TWO] "with" [THREE] "[" [NINE] "]" ';
output;
have='"[" [] "]" "combine" [TWO] "with" [THREE] "[" [x] "]" "[" [NINE] "]" ';
output;
run;
data want(drop=_:);
set have;
length want $ 100;
_pid=prxparse('/\[([^\[]*?)\]/');
_start=1;
_stop=length(have);
do until(_pos<=0);
call prxnext(_pid,_start,_stop,have,_pos,_len);
want=catx(',',want,prxposn(_pid, 1, have));
end;
run;
proc print data=want;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.