Hi All,
i am having a text embedded in square brackets with multiple delimiters in the text . Text in square brackets are considered as variable names. I want to extract these variable names and store in another variable. In some cases the square brackets itself act as delimiter.I have tried this using SCAN and COUNTW, but countw is counting each '[' separately but not combinely as single '[]' into one,hence the count is becoming many and loop is executing multiple times. Any suggestions ,may be i am missing the basic logic here.
data have;
length have $100. want $50.;
have='[ONE] "/" [TWO] "/(" [THREE] "-" [FOUR] ")\" [FIVE]';
want="ONE,TWO,THREE,FOUR,FIVE";OUTPUT;
have='[SIX] "(" [SEVEN] ")" [EIGHT] "-" [NINE] "--" ';
want="SIX,SEVEN,EIGHT,NINE";OUTPUT;
have='[ONE] "#" [TWO] "have" [THREE] "$" [NINE] "with" ';
want="ONE,TWO,THREE,NINE";output;
have=' "combine" [TWO] "with" [THREE] "[" [NINE] "]" ';
want="TWO,THREE,NINE";OUTPUT;
run ;
Another way:
WANT=compress(transtrn(prxchange('s/( *|"[^"]*")//' ,-1,HAVE),'][',','),'[]');
put WANT=;
WANT=ONE,TWO,THREE,FOUR,FIVE
WANT=SIX,SEVEN,EIGHT,NINE
WANT=ONE,TWO,THREE,NINE
WANT=TWO,THREE,NINE
That's a bit tricky, but it's just text, and it can be parsed. I added a little sub-routine to your DATA step. It puts out two lines for each "have" data that you start with. The first of the two is "Want" -- this is what you want to end up with. The second is "Got" which is what the sub-routine is actually producing. Want and Got should be identical if the sub-routine is working properly.
Here's the code:
data have;
DROP have;
LENGTH Status $6;
length have $100. want $50.;
have='[ONE] "/" [TWO] "/(" [THREE] "-" [FOUR] ")\" [FIVE]';
want="ONE,TWO,THREE,FOUR,FIVE"; Status='Want'; OUTPUT;
LINK Transform_Data;
have='[SIX] "(" [SEVEN] ")" [EIGHT] "-" [NINE] "--" ';
want="SIX,SEVEN,EIGHT,NINE";Status='Want'; OUTPUT;
LINK Transform_Data;
have='[ONE] "#" [TWO] "have" [THREE] "$" [NINE] "with" ';
want="ONE,TWO,THREE,NINE";Status='Want'; output;
LINK Transform_Data;
have=' "combine" [TWO] "with" [THREE] "[" [NINE] "]" ';
want="TWO,THREE,NINE";Status='Want'; OUTPUT;
LINK Transform_Data;
RETURN;
Transform_Data:
Want=COMPRESS(TRANWRD(TRANWRD(TRANWRD(TRANSLATE(TRANWRD(COMPRESS(TRANWRD(TRANWRD(TRANWRD(have, '/(', '/'),')\','\'),'--',''),']" '),'[[',','),',,,,,,,','/\-()#$'),'with',','),'have',','),'combine',''),'[ ');
IF SUBSTR(Want, LENGTH(Want), 1) = ',' THEN
WANT = SUBSTR(Want, 1, (LENGTH(Want) - 1));
Status='Got';
OUTPUT;
RETURN;
run;
And below are the results. The "Want" and the "Got" are in fact identical, so the sub-routine is confirmed to be working correctly.
Obs Status want 1 Want ONE,TWO,THREE,FOUR,FIVE 2 Got ONE,TWO,THREE,FOUR,FIVE 3 Want SIX,SEVEN,EIGHT,NINE 4 Got SIX,SEVEN,EIGHT,NINE 5 Want ONE,TWO,THREE,NINE 6 Got ONE,TWO,THREE,NINE 7 Want TWO,THREE,NINE 8 Got TWO,THREE,NINE
Hope that helps,
Jim
Another way:
WANT=compress(transtrn(prxchange('s/( *|"[^"]*")//' ,-1,HAVE),'][',','),'[]');
put WANT=;
WANT=ONE,TWO,THREE,FOUR,FIVE
WANT=SIX,SEVEN,EIGHT,NINE
WANT=ONE,TWO,THREE,NINE
WANT=TWO,THREE,NINE
A slightly better one-liner:
WANT=compress(transtrn(prxchange('s/.*?(\[\w*\])[^\[]*/\1/' ,-1,HAVE),'][',','),'[]');
ONE,TWO,THREE,FOUR,FIVE
SIX,SEVEN,EIGHT,NINE
ONE,TWO,THREE,NINE
TWO,THREE,NINE
data have; length have $100.; have='[ONE] "/" [TWO] "/(" [THREE] "-" [FOUR] ")\" [FIVE]'; output; have='[SIX] "(" [SEVEN] ")" [EIGHT] "-" [NINE] "--" '; output; have='[ONE] "#" [TWO] "have" [THREE] "$" [NINE] "with" '; output; have=' "combine" [TWO] "with" [THREE] "[" [NINE] "]" '; output; run ; data want; set have; length want $ 100; pid=prxparse('/\[\w+?\]/'); s=1;e=length(have); call prxnext(pid,s,e,have,p,l); do while(p>0); want=catx(',',want,compress(substr(have,p,l),'[]')); call prxnext(pid,s,e,have,p,l); end; drop pid s e p l; run;
Here a variation to what @Ksharp proposed.
data have;
length have $100.;
have='[ONE] "/" [TWO] "/(" [THREE] "-" [FOUR] ")\" [FIVE]';
output;
have='[SIX] "(" [SEVEN] ")" [EIGHT] "-" [NINE] "--" ';
output;
have='[ONE] "#" [TWO] "have" [THREE] "$" [NINE] "with" ';
output;
have=' "combine" [TWO] "with" [THREE] "[" [NINE] "]" ';
output;
have='"[" [] "]" "combine" [TWO] "with" [THREE] "[" [x] "]" "[" [NINE] "]" ';
output;
run;
data want(drop=_:);
set have;
length want $ 100;
_pid=prxparse('/\[([^\[]*?)\]/');
_start=1;
_stop=length(have);
do until(_pos<=0);
call prxnext(_pid,_start,_stop,have,_pos,_len);
want=catx(',',want,prxposn(_pid, 1, have));
end;
run;
proc print data=want;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.