I have macro variables (see below) that contain the names of variables of a dataset.
Copy to reproduce example
%let _varlist = %str(_SORT1 RISKFACT _TOGETH _T_1 _T_2 _T_3 _T_4 _MERGE _BIN1 _BIN2);
%let _varlist_v2 = %str(_SORT1 RISKFACT _TOGETH _T_1 _T_2 _T_3 _T_4);
%let _varlist_v3 = %str(_SORT1 RISKFACT _TOGETH _T_1 _T_2 BIN1 _MERGE _T_3 _OTH1 _OTH2 _T_4);
data dummy_varlist;
length _SORT1 RISKFACT _TOGETH _T_1-_T_4 _MERGE _BIN1-_BIN2 $ 20;
array _chars _SORT1 RISKFACT _TOGETH _T_1-_T_4 _MERGE _BIN1-_BIN2;
do over _chars;
call missing (of _chars);
end;
run;
Task
I want to extract the substring
_T_1 _T_2 _T_3 _T_4
of the above macro variables (&_varlist, &_varlist_v2 and &_varlist_v3) using a wildcard/ pattern search.
Restrictions
Lets assume I/we only know the name of the variables we look for, but not anything about position, order or appearance of other variables. Hence we cannot use information of anything other than the provided search string.
Ideally it should also work for macro variable &_varlist_v3 where the required variables are mixed in-between others.
Methods/ Howto
I do think it should be possible to solve this using an elegant one-liner solution, but I fail to get it right.
I think one way is to use regular expressions and therefore applying a function of the prx family: prxchange() is my best guess. Probably I use the wrong search/replace regular expression.
One-line solution attempts
Here are some attempts that failed.
%let _invert = %bquote(%sysfunc(prxchange(%str(s/_t_\d+/$1/i), -1, %bquote(&_varlist))));
/* Gives:
_SORT1 RISKFACT _TOGETH _MERGE _BIN1 _BIN2
*/
%let _snippet = %bquote(%sysfunc(prxchange(%str(s/^(.*\w+\b) _t_\d+/$2/i), -1, %bquote(&_varlist))));
/* Gives:
_MERGE _BIN1 _BIN2
*/
Other solution
Working, but awkward/ over-complicated:
data _null_;
length varlist $ 200;
call missing(varlist);
do _n_ = 1 by 1 until (last);
set sashelp.vcolumn (where = (libname = %upcase("work") and memname = %upcase("dummy_varlist") and prxmatch("/_t_\d+/i", name))) end = last;
varlist = ifc(_n_ = 1, name, catx(" ", varlist, name));
end;
call symputx("_varlist_out", varlist); /* assigns the value "_t_1 _t_2 _t_3 _t_4". */
run;
%put &_varlist_out;
/* Gives:
_T_1 _T_2 _T_3 _T_4
*/
If you have any other suggestions (including proc sql, pipe [unix commands]) I am happy to learn them.
It is much easier to work with strings in data steps than in macro variables.
I think this is what you are trying to do.
data have;
infile cards truncover;
input varlist $100. ;
varlist=compbl(varlist);
cards;
_SORT1 RISKFACT _TOGETH _T_1 _T_2 _T_3 _T_4 _MERGE _BIN1 _BIN2
_SORT1 RISKFACT _TOGETH _T_1 _T_2 _T_3 _T_4
_SORT1 RISKFACT _TOGETH _T_1 _T_2 BIN1 _MERGE _T_3 _OTH1 _OTH2 _T_4
;
data want;
set have;
droplist = compbl(prxchange('s/_t_\d+//i', -1, varlist));
keeplist = compbl(prxchange(cats('s/',translate(droplist,'|',' '),'//i'), -1, varlist));
put (_all_) (=/) //;
run;
varlist=_SORT1 RISKFACT _TOGETH _T_1 _T_2 _T_3 _T_4 _MERGE _BIN1 _BIN2 droplist=_SORT1 RISKFACT _TOGETH _MERGE _BIN1 _BIN2 keeplist=_T_1 _T_2 _T_3 _T_4 varlist=_SORT1 RISKFACT _TOGETH _T_1 _T_2 _T_3 _T_4 droplist=_SORT1 RISKFACT _TOGETH keeplist=_T_1 _T_2 _T_3 _T_4 varlist=_SORT1 RISKFACT _TOGETH _T_1 _T_2 BIN1 _MERGE _T_3 _OTH1 _OTH2 _T_4 droplist=_SORT1 RISKFACT _TOGETH BIN1 _MERGE _OTH1 _OTH2 keeplist=_T_1 _T_2 _T_3 _T_4
It is much easier to work with strings in data steps than in macro variables.
I think this is what you are trying to do.
data have;
infile cards truncover;
input varlist $100. ;
varlist=compbl(varlist);
cards;
_SORT1 RISKFACT _TOGETH _T_1 _T_2 _T_3 _T_4 _MERGE _BIN1 _BIN2
_SORT1 RISKFACT _TOGETH _T_1 _T_2 _T_3 _T_4
_SORT1 RISKFACT _TOGETH _T_1 _T_2 BIN1 _MERGE _T_3 _OTH1 _OTH2 _T_4
;
data want;
set have;
droplist = compbl(prxchange('s/_t_\d+//i', -1, varlist));
keeplist = compbl(prxchange(cats('s/',translate(droplist,'|',' '),'//i'), -1, varlist));
put (_all_) (=/) //;
run;
varlist=_SORT1 RISKFACT _TOGETH _T_1 _T_2 _T_3 _T_4 _MERGE _BIN1 _BIN2 droplist=_SORT1 RISKFACT _TOGETH _MERGE _BIN1 _BIN2 keeplist=_T_1 _T_2 _T_3 _T_4 varlist=_SORT1 RISKFACT _TOGETH _T_1 _T_2 _T_3 _T_4 droplist=_SORT1 RISKFACT _TOGETH keeplist=_T_1 _T_2 _T_3 _T_4 varlist=_SORT1 RISKFACT _TOGETH _T_1 _T_2 BIN1 _MERGE _T_3 _OTH1 _OTH2 _T_4 droplist=_SORT1 RISKFACT _TOGETH BIN1 _MERGE _OTH1 _OTH2 keeplist=_T_1 _T_2 _T_3 _T_4
Hi @Tom,
thank you for your quick reply!
I think I have not mentioned why I insisted on regular expressions in this question.
In cases like the above I don't know in advance how many variables I have and hence cannot provide a fixed string in advance.
I do only know the pattern.
In the example above I could also have variable names like
_T_1 _T_2 _T_3 _T_4 _T_5 [...] _T_87 [...] _T_102
So I am looking for a dynamic solution regarding the regular expression pattern.
@left wrote:
Hi @Tom,
thank you for your quick reply!
I think I have not mentioned why I insisted on regular expressions in this question.
In cases like the above I don't know in advance how many variables I have and hence cannot provide a fixed string in advance.
I do only know the pattern.
In the example above I could also have variable names like
_T_1 _T_2 _T_3 _T_4 _T_5 [...] _T_87 [...] _T_102
So I am looking for a dynamic solution regarding the regular expression pattern.
Huh????
The code I posted is using regular expressions and as it demonstrates works for all of the examples you provided.
Works pretty well for the new list also:
varlist=_T_1 _T_2 _T_3 _T_4 _T_5 [...] _T_87 [...] _T_102 droplist=[...] [...] keeplist=_T_1 _T_2 _T_3 _T_4 _T_5 [] _T_87 [] _T_102
I suspect the periods confused RegEx.
Are you saying the pattern is not underscore followed by T followed by underscore followed by one or more digits?
@Tom wrote:
@left wrote:
Hi @Tom,
thank you for your quick reply!
I think I have not mentioned why I insisted on regular expressions in this question.
In cases like the above I don't know in advance how many variables I have and hence cannot provide a fixed string in advance.
I do only know the pattern.
In the example above I could also have variable names like
_T_1 _T_2 _T_3 _T_4 _T_5 [...] _T_87 [...] _T_102
So I am looking for a dynamic solution regarding the regular expression pattern.
Huh????
The code I posted is using regular expressions and as it demonstrates works for all of the examples you provided.
Works for the new list also:
varlist=_T_1 _T_2 _T_3 _T_4 _T_5 [...] _T_87 [...] _T_102 droplist=[...] [...] keeplist=_T_1 _T_2 _T_3 _T_4 _T_5 [] _T_87 [] _T_102
Are you saying the pattern is not underscore followed by T followed by underscore followed by one or more digits?
Hi @Tom ,
the pattern is correct as specified - your code is also doing the job just perfectly.
When reading your solution I overlooked the first part where you derived the solution and just focused on the 2nd box with the results thinking you wanted me to use a fixed string _T_1 _T_2 _T_3 _T_4 only. Excuse me for this.
However I wonder if I can get a result for a single macro variable on demand using only a single line of code, e.g. with this kind of pseudocode
%let droplist = %sysfunc(compbl(%sysfunc(prxchange(s/_t_\d+//i, -1, &_varlist));
%let keeplist = %sysfunc(compbl(%sysfunc(prxchange(cats(s/,translate(&droplist,|, ),//i), -1, &_varlist));
but in one line only (or max. 2).
Maybe using a pipe (applying some unix command like sed, awk) if it is difficult in SAS directly without using a datastep.
However, if this would just not make sense, then your solution will become the one to go for.
The same way you can convert these statements:
y = x + 5;
z = sqrt(y) ;
To one statement.
z = sqrt(x + 5);
But why would you want to generate code you could never hope to be able to maintain?
%let list=_SORT1 RISKFACT _TOGETH _T_1 _T_2 BIN1 _MERGE _T_3 _OTH1 _OTH2 _T_4;
%let keeplist=%sysfunc(compbl(%sysfunc(prxchange(s/%sysfunc(translate(%sysfunc(compbl(%sysfunc(prxchange(s/_t_\d+//i,-1,&list)))),|,%str( )))//i,-1,&list))));
%put &=keeplist;
392 %let list=_SORT1 RISKFACT _TOGETH _T_1 _T_2 BIN1 _MERGE _T_3 _OTH1 _OTH2 _T_4; 393 %let 393! keeplist=%sysfunc(compbl(%sysfunc(prxchange(s/%sysfunc(translate(%sysfunc(compbl(%sysfunc(prxchange(s/_t_\d+//i,-1,&list)))),|, 393! %str( )))//i,-1,&list)))); 394 %put &=keeplist; KEEPLIST=_T_1 _T_2 _T_3 _T_4
@Tom wrote:
But why would you want to generate code you could never hope to be able to maintain?
True, Tom.
Having code in that form in one line is hard to read, maintain and debug.
I was hoping this can be achieved by a single function call of prxchange() only. That would still be feasable if wrapped by %sysfunc() outside of a data step.
All in all - your first answer is the most appropriate solution then.
Thank you indeed that you have gone that extra mile to explain why a one-liner is not the best solution.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.