Solved: Grabbing all columns with names that start with a string in a successi... - Page 2

hubertsng · Posted 07-22-2019 01:21 PM

I realize you can do that but similar to the "LIKE" problem, it will grab all of the columns that start with "SOMETHING" and not just the ones that start with "S"

Reeza · Posted 07-22-2019 01:16 PM

proc sql noprint;

select name into :var_list separated by " "
from sashelp.vcolumn 
where libname = "WORK"
and memname = "HAVE"
and name like 'S%';

quit;

%put &var_list;

Here's how you can create that variable list in a macro variable, if desired.

@hubertsng wrote:

I have no idea how to condense my question into one sentence but what I am trying to do is perform a macro on all columns that name starts with a successive pattern (S1-S100).

So let's say there are 500 columns in which that 100 of them are named S1_Something to S100_Something and the rest are named SOMETHING_Something to SOMETHING_Something. I want to perform a macro on all of the columns that start with S1-S100. So what I have so far is kind of an ugly way of doing it but it is
PROC SQL;
    CREATE TABLE ques AS
    SELECT * FROM dictionary.columns
    WHERE memname="HAVE";
QUIT;

PROC TRANSPOSE DATA=ques OUT=quest;
    VAR name;
RUN;

DATA quest;
    SET quest;
    ARRAY cols COL1-COL%EVAL(&sqlobs.-2);
    DO OVER cols;
        IF SCAN(cols, 1, "_") IN ("S1", "S2", "S3") THEN PUT cols;
    END;
RUN;
But instead of outputting the column into the log, I would then concatenate it into a macro variable and then shove that macro variable into a macro that will then loop over all of the strings inside the macro variable and perform the macro on that. So what I have should work however I was wondering if there was an easier way of doing it. Basically having an array of "S1"-"S100" and seeing if it is in that array but I don't believe that is possible. I wouldn't be able to do LIKE S% because it would also grab all of the columns that are like S% which is all of the columns and not just S1_Something - S100_Something.

Go ahead and ask any questions if you are confused on what I want exactly since I'm bad at explaining.

hubertsng · Posted 07-22-2019 01:22 PM

Like the example that I had given, that code will grab all of the columns that start with "S" which would grab the columns SOMETHING1_something-SOMETHING400_something and notj ust S1_something-S100_something.

Reeza · Posted 07-22-2019 01:26 PM

@hubertsng wrote:

Like the example that I had given, that code will grab all of the columns that start with "S" which would grab the columns SOMETHING1_something-SOMETHING400_something and notj ust S1_something-S100_something.

You can make like more complex though, even use REGEX if you want to make it super complicated. You're finding a lot of reasons why our solutions won't work for you. I really suggest explaining what you're doing in more detail at this point.

Shmuel · Posted 07-22-2019 03:05 PM

Suppose you have three groups of variables:

1) V_prefixA_1_suffixA - V_prefixA_100_suffixA

2) V_prefixA_1_suffixB - V_prefixA_100_suffixB

3) V_prefixA_1 - V_prefixA_100

you want to grab only one group.

- using V_prefixA_: will grab all the three groups !

- A list defined by: Array V_prefixA_1 - V_prefixA_100; will grab group 3 only,
but you can't use that method to grab group 1 or group 2.

In that case I suggest to do a massive rename of variables:

- rename group 1 to V_prefixA_suffixA_1 - V_prefixA_suffixA_100 and define list by V_prefixA_suffixA_:

- rename group 2 to V_prefixA_suffixB_1 - V_prefixA_suffixB_100 and define list by V_prefixA_suffixB_:

If your case is similar to above you can do the massive rename by:

proc sql;
      create table var_names as select
      name from dictionary.columns
      where memname='HAVE';
quit;

filename ren temp;
data _null_;
  set var_names;
        length a_line $80;
        chk1 = 'V_prefixA_';
        chk2 = '_suffixA';
        chk3 = '_suffixB';

        ix1 = index(name,trim(chk1));
        ix2 = index(name,trim(chk2));
        ix3 = index(name,trim(chk3));
        if ix1=1 then do;
           if ix2 > 1 then new_name = cats(ch1,chk2,scan(name,3,'_')); else
           if ix3 > 1 then new_name = cats(ch1,chk3,scan(name,3,'_')); 
           file ren;
           a_line = cats(name,' = ',new_name);
           put a_line;
        end;
run;

proc datasets lib=<library having data HAVE >;
     modify have;
     %include ren;
quit; run;

you should adapt the CHK# constants to check according to your specific variable names.

The SCAN function should grab the sequence number of the variable in the list.

Pay attention - using proc datasets - modify statement will modify your original HAVE dataset.

You can rename the variables creating new dataset by:

data new;
 set have;
       %include ren;
run;

hashman · Posted 07-22-2019 08:08 PM

@hubertsng:

My understanding of your rules (after your Nth explanation) is:

- From a given set of variable names in SAS data set HAVE, select the variable names:

starting with a given prefix (e.g., "S")

followed by up to 3 digits

followed by a non-digit

You don't need to go to the dictionary tables for this - the DATA step compiler is a good enough tool:

data have ;                                                                             
  call missing (S1_a, S10p4q, S77bb12, S85_xyz23, S100, SS, S_33, SAb99, Z01_w) ;       
run ;                                                                                   
                                                                                        
data want (keep = _vn) ;                                                                
  length _vn $ 32 ;                                                                     
  _p = "S" ; * prefix ;                                                                 
  _d =  3  ; * number of digits after prefix before non-digit ;                         
  do until (_vn = "") ;                                                                 
    call vnext (_vn) ;                                                                  
    if _vn =: _p and 1 < notdigit (substr (_vn, 1 + length (_p))) < _d + 2 then output ;
  end ;                                                                                 
  stop ;                                                                                
  set have ;                                                                            
run ;

Now that you have the data set WANT you can do with the _VN values in it whatever you want. In fact, if you just want to create a bunch of macro variables based on the _VN values, you don't even need to write WANT - it can be done in the same DO loop shown above.

Kind regards

Paul D.

Re: Grabbing all columns with names that start with a string in a successive pattern

Re: Grabbing all columns with names that start with a string in a successive pattern

Re: Grabbing all columns with names that start with a string in a successive pattern

Re: Grabbing all columns with names that start with a string in a successive pattern

Re: Grabbing all columns with names that start with a string in a successive pattern

Re: Grabbing all columns with names that start with a string in a successive pattern

Re: Grabbing all columns with names that start with a string in a successive pattern

Re: Grabbing all columns with names that start with a string in a successive pattern

Re: Grabbing all columns with names that start with a string in a successive pattern

Re: Grabbing all columns with names that start with a string in a successive pattern

Re: Grabbing all columns with names that start with a string in a successive pattern

Re: Grabbing all columns with names that start with a string in a successive pattern

Registration is open

SAS Training: Just a Click Away