Solved: Filter multiple columns with the same criteria

mklangley · Posted 01-24-2024 05:40 PM

Suppose I have data like the following:

data have;
    input col1 $ col2 $ col_with_diff_naming_convention $ col4 $ col5 $;
    datalines;
T  F  T  W  O
T  T  T  T  T
W  E  T  H  T
P  T  H  J  T
T  G  T  H  L
;
run;

I would like to omit all rows where the columns all equal "T". In the example above, I want all rows except row 2.

I could do this with hard-coding (like below) or using DICTIONARY.COLUMNS.

data want;
    set have;
    where col1 ne 'T'
        and col2 ne 'T'
        and col_with_diff_naming_convention ne 'T'
        and col4 ne 'T'
        and col5 ne 'T';
run;

But I am wondering if anybody knows a slick way to do this using arrays (or another similarly concise approach).

ballardw · Posted 01-24-2024 05:59 PM

This works for your example data and is likely extensible to more "real" data like words instead of single letter values:

data want;
   set have;
   if not ('T' =  col1 = col2 = col_with_diff_naming_convention = col4 = col5 );
run;

SAS will allow multiple comparisons such as =, < , > all in one expression though < and > and similar are generally not a good idea for character values until you really understand how they work.

If the data is actually single character then you could concatenate all the variables into one and use the COUNTC function to see how many times T appears and if it matches the number of variables then delete the observation.

View solution in original post

ballardw · Posted 01-24-2024 05:59 PM

This works for your example data and is likely extensible to more "real" data like words instead of single letter values:

data want;
   set have;
   if not ('T' =  col1 = col2 = col_with_diff_naming_convention = col4 = col5 );
run;

SAS will allow multiple comparisons such as =, < , > all in one expression though < and > and similar are generally not a good idea for character values until you really understand how they work.

If the data is actually single character then you could concatenate all the variables into one and use the COUNTC function to see how many times T appears and if it matches the number of variables then delete the observation.

SASKiwi · Posted 01-24-2024 07:36 PM

This works for your example but something more flexible may be preferable:

data want;
  set have;
  if cats(of _character_) ne 'TTTTT';
run;

mklangley · Posted 01-25-2024 10:27 AM

Thank you, @ballardw and @SASKiwi ! I appreciate your prompt responses. Both of those are good approaches--wish I could accept both as solutions. For my use case, 'T' = col1 = col2... will be easier to maintain, so I'll go with that.

ballardw · Posted 01-25-2024 10:32 AM

You may need to consider case for such comparisons and require use of either UPCASE or LOWCASE functions on all of your variables for the comparison as "ABC" is not equal to "abc" "Abc" "aBc" (etc.).

Filter multiple columns with the same criteria

Re: Filter multiple columns with the same criteria

Re: Filter multiple columns with the same criteria

Re: Filter multiple columns with the same criteria

Re: Filter multiple columns with the same criteria

Re: Filter multiple columns with the same criteria

Filter multiple columns with the same criteria

Re: Filter multiple columns with the same criteria

Re: Filter multiple columns with the same criteria

Re: Filter multiple columns with the same criteria

Re: Filter multiple columns with the same criteria

Re: Filter multiple columns with the same criteria

Click image to register for webinar

Classroom Training Available!