BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
left
Obsidian | Level 7

I have macro variables (see below) that contain the names of variables of a dataset.

 

Copy to reproduce example

%let _varlist = %str(_SORT1 RISKFACT _TOGETH _T_1 _T_2 _T_3 _T_4 _MERGE _BIN1 _BIN2);
%let _varlist_v2 = %str(_SORT1 RISKFACT _TOGETH _T_1 _T_2 _T_3 _T_4);
%let _varlist_v3 = %str(_SORT1 RISKFACT _TOGETH _T_1 _T_2 BIN1 _MERGE _T_3 _OTH1 _OTH2 _T_4);

data dummy_varlist;
    length _SORT1 RISKFACT _TOGETH _T_1-_T_4 _MERGE _BIN1-_BIN2 $ 20;
    array _chars _SORT1 RISKFACT _TOGETH _T_1-_T_4 _MERGE _BIN1-_BIN2;
    do over _chars;
        call missing (of _chars);
    end;
run;

Task

I want to extract the substring

_T_1 _T_2 _T_3 _T_4

of the above macro variables (&_varlist&_varlist_v2 and &_varlist_v3) using a wildcard/ pattern search. 

 

Restrictions

Lets assume I/we only know the name of the variables we look for, but not anything about position, order or appearance of other variables. Hence we cannot use information of anything other than the provided search string.

Ideally it should also work for macro variable &_varlist_v3 where the required variables are mixed in-between others. 

 

Methods/ Howto

I do think it should be possible to solve this using an elegant one-liner solution, but I fail to get it right.

I think one way is to use regular expressions and therefore applying a function of the prx family: prxchange() is my best guess. Probably I use the wrong search/replace regular expression.

 

One-line solution attempts

Here are some attempts that failed.

%let _invert = %bquote(%sysfunc(prxchange(%str(s/_t_\d+/$1/i), -1, %bquote(&_varlist))));
/* Gives:
 _SORT1 RISKFACT _TOGETH     _MERGE _BIN1 _BIN2
*/

%let _snippet = %bquote(%sysfunc(prxchange(%str(s/^(.*\w+\b) _t_\d+/$2/i), -1, %bquote(&_varlist))));
/* Gives:
 _MERGE _BIN1 _BIN2
*/

Other solution

Working, but awkward/ over-complicated:

data _null_;
    length varlist $ 200;
    call missing(varlist);
    do _n_ = 1 by 1 until (last);
        set sashelp.vcolumn (where = (libname = %upcase("work") and memname = %upcase("dummy_varlist") and prxmatch("/_t_\d+/i", name))) end = last;
        varlist = ifc(_n_ = 1, name, catx(" ", varlist, name));
    end;
    call symputx("_varlist_out", varlist);      /* assigns the value "_t_1 _t_2 _t_3 _t_4". */
run;
%put &_varlist_out;
/* Gives:
_T_1 _T_2 _T_3 _T_4
*/

If you have any other suggestions (including proc sql, pipe [unix commands]) I am happy to learn them.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

It is much easier to work with strings in data steps than in macro variables.

 

I think this is what you are trying to do.

data have;
  infile cards truncover;
  input varlist $100. ;
  varlist=compbl(varlist);
cards;
_SORT1 RISKFACT _TOGETH _T_1 _T_2 _T_3 _T_4 _MERGE _BIN1 _BIN2
_SORT1 RISKFACT _TOGETH _T_1 _T_2 _T_3 _T_4
_SORT1 RISKFACT _TOGETH _T_1 _T_2 BIN1 _MERGE _T_3 _OTH1 _OTH2 _T_4
;

data want;
  set have;
  droplist = compbl(prxchange('s/_t_\d+//i', -1, varlist));
  keeplist = compbl(prxchange(cats('s/',translate(droplist,'|',' '),'//i'), -1, varlist));
  put (_all_) (=/) //;
run;

varlist=_SORT1 RISKFACT _TOGETH _T_1 _T_2 _T_3 _T_4 _MERGE _BIN1 _BIN2
droplist=_SORT1 RISKFACT _TOGETH _MERGE _BIN1 _BIN2
keeplist=_T_1 _T_2 _T_3 _T_4



varlist=_SORT1 RISKFACT _TOGETH _T_1 _T_2 _T_3 _T_4
droplist=_SORT1 RISKFACT _TOGETH
keeplist=_T_1 _T_2 _T_3 _T_4



varlist=_SORT1 RISKFACT _TOGETH _T_1 _T_2 BIN1 _MERGE _T_3 _OTH1 _OTH2 _T_4
droplist=_SORT1 RISKFACT _TOGETH BIN1 _MERGE _OTH1 _OTH2
keeplist=_T_1 _T_2 _T_3 _T_4

View solution in original post

7 REPLIES 7
Tom
Super User Tom
Super User

It is much easier to work with strings in data steps than in macro variables.

 

I think this is what you are trying to do.

data have;
  infile cards truncover;
  input varlist $100. ;
  varlist=compbl(varlist);
cards;
_SORT1 RISKFACT _TOGETH _T_1 _T_2 _T_3 _T_4 _MERGE _BIN1 _BIN2
_SORT1 RISKFACT _TOGETH _T_1 _T_2 _T_3 _T_4
_SORT1 RISKFACT _TOGETH _T_1 _T_2 BIN1 _MERGE _T_3 _OTH1 _OTH2 _T_4
;

data want;
  set have;
  droplist = compbl(prxchange('s/_t_\d+//i', -1, varlist));
  keeplist = compbl(prxchange(cats('s/',translate(droplist,'|',' '),'//i'), -1, varlist));
  put (_all_) (=/) //;
run;

varlist=_SORT1 RISKFACT _TOGETH _T_1 _T_2 _T_3 _T_4 _MERGE _BIN1 _BIN2
droplist=_SORT1 RISKFACT _TOGETH _MERGE _BIN1 _BIN2
keeplist=_T_1 _T_2 _T_3 _T_4



varlist=_SORT1 RISKFACT _TOGETH _T_1 _T_2 _T_3 _T_4
droplist=_SORT1 RISKFACT _TOGETH
keeplist=_T_1 _T_2 _T_3 _T_4



varlist=_SORT1 RISKFACT _TOGETH _T_1 _T_2 BIN1 _MERGE _T_3 _OTH1 _OTH2 _T_4
droplist=_SORT1 RISKFACT _TOGETH BIN1 _MERGE _OTH1 _OTH2
keeplist=_T_1 _T_2 _T_3 _T_4
left
Obsidian | Level 7

Hi @Tom

 

thank you for your quick reply! 

 

I think I have not mentioned why I insisted on regular expressions in this question. 

In cases like the above I don't know in advance how many variables I have and hence cannot provide a fixed string in advance. 

I do only know the pattern

 

In the example above I could also have variable names like

_T_1 _T_2 _T_3 _T_4 _T_5 [...] _T_87 [...] _T_102

 

So I am looking for a dynamic solution regarding the regular expression pattern.  

Tom
Super User Tom
Super User

@left wrote:

Hi @Tom

 

thank you for your quick reply! 

 

I think I have not mentioned why I insisted on regular expressions in this question. 

In cases like the above I don't know in advance how many variables I have and hence cannot provide a fixed string in advance. 

I do only know the pattern

 

In the example above I could also have variable names like

_T_1 _T_2 _T_3 _T_4 _T_5 [...] _T_87 [...] _T_102

 

So I am looking for a dynamic solution regarding the regular expression pattern.  


Huh????

 

The code I posted is using regular expressions and as it demonstrates works for all of the examples you provided.

 

Works pretty well for the new list also:

varlist=_T_1 _T_2 _T_3 _T_4 _T_5 [...] _T_87 [...] _T_102
droplist=[...] [...]
keeplist=_T_1 _T_2 _T_3 _T_4 _T_5 [] _T_87 [] _T_102

I suspect the periods confused RegEx.

 

Are you saying the pattern is not underscore followed by T followed by underscore followed by one or more digits?

left
Obsidian | Level 7

@Tom wrote:

@left wrote:

Hi @Tom

 

thank you for your quick reply! 

 

I think I have not mentioned why I insisted on regular expressions in this question. 

In cases like the above I don't know in advance how many variables I have and hence cannot provide a fixed string in advance. 

I do only know the pattern

 

In the example above I could also have variable names like

_T_1 _T_2 _T_3 _T_4 _T_5 [...] _T_87 [...] _T_102

 

So I am looking for a dynamic solution regarding the regular expression pattern.  


Huh????

 

The code I posted is using regular expressions and as it demonstrates works for all of the examples you provided.

 

Works for the new list also:

varlist=_T_1 _T_2 _T_3 _T_4 _T_5 [...] _T_87 [...] _T_102
droplist=[...] [...]
keeplist=_T_1 _T_2 _T_3 _T_4 _T_5 [] _T_87 [] _T_102

 

Are you saying the pattern is not underscore followed by T followed by underscore followed by one or more digits?


Hi @Tom , 

 

the pattern is correct as specified - your code is also doing the job just perfectly. 

 

When reading your solution I overlooked the first part where you derived the solution and just focused on the 2nd box with the results thinking you wanted me to use a fixed string _T_1 _T_2 _T_3 _T_4 only. Excuse me for this.

 

However I wonder if I can get a result for a single macro variable on demand using only a single line of code, e.g. with this kind of pseudocode

%let droplist = %sysfunc(compbl(%sysfunc(prxchange(s/_t_\d+//i, -1, &_varlist));
%let keeplist = %sysfunc(compbl(%sysfunc(prxchange(cats(s/,translate(&droplist,|, ),//i), -1, &_varlist));

but in one line only (or max. 2). 

 

Maybe using a pipe (applying some unix command like sedawk) if it is difficult in SAS directly without using a datastep. 

 

However, if this would just not make sense, then your solution will become the one to go for.

Tom
Super User Tom
Super User

The same way you can convert these statements:

y = x + 5;
z = sqrt(y) ;

To one statement.

z = sqrt(x + 5);
Tom
Super User Tom
Super User

But why would you want to generate code you could never hope to be able to maintain?

%let list=_SORT1 RISKFACT _TOGETH _T_1 _T_2 BIN1 _MERGE _T_3 _OTH1 _OTH2 _T_4;
%let keeplist=%sysfunc(compbl(%sysfunc(prxchange(s/%sysfunc(translate(%sysfunc(compbl(%sysfunc(prxchange(s/_t_\d+//i,-1,&list)))),|,%str( )))//i,-1,&list))));
%put &=keeplist;
392  %let list=_SORT1 RISKFACT _TOGETH _T_1 _T_2 BIN1 _MERGE _T_3 _OTH1 _OTH2 _T_4;
393  %let
393! keeplist=%sysfunc(compbl(%sysfunc(prxchange(s/%sysfunc(translate(%sysfunc(compbl(%sysfunc(prxchange(s/_t_\d+//i,-1,&list)))),|,
393! %str( )))//i,-1,&list))));
394  %put &=keeplist;
KEEPLIST=_T_1 _T_2 _T_3 _T_4
left
Obsidian | Level 7

@Tom wrote:

But why would you want to generate code you could never hope to be able to maintain?

True, Tom. 

Having code in that form in one line is hard to read, maintain and debug. 

I was hoping this can be achieved by a single function call of prxchange() only. That would still be feasable if wrapped by %sysfunc() outside of a data step.

All in all - your first answer is the most appropriate solution then. 

Thank you indeed that you have gone that extra mile to explain why a one-liner is not the best solution.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1524 views
  • 1 like
  • 2 in conversation