Hi I have 10 variables in a sas dataset x1-X5 and Y1-Y5, how can write an array to automatically filter the following condition
if (X1<10 or X2<10 or X3<10 or X4<10 or X5<10) or (Y1>10 or Y2>10 or Y3>10 or Y4>10 or Y5>10
Depends on what you want to accomplish. I think that the SAS datastep compiler is reasonably optimized for this kind of processing, meaning that your original statement is probably as efficient as it gets in terms of performance. I think the program will stop checking the following parts as soon as one part of a multiple OR clause has been proven true.
So, if you want fast and dirty, use the construct
if min(of x1-x5)<10 or max(of y1-y5)>10 then...
By "fast and dirty" I mean that this is fast to write, but harder to read than the original statement, and it probably executes a teeny bit slower, as the program will first compare all the variables in each part to each other to find the minimum or maximum, and then compare the result to 10.
If you have a really large number of variables, and you really care about execution time, I would recommend using a macro construct.
One possibility, if you have two groups of variables, prefixed by "X" and "Y", is to use SQL to construct the clauses:
Proc sql noprint;
select cats(name,'<10') into :x_clause separated by ' or '
from dictionary.tables
where memname='<data set name in caps>'
and libname='<library name in caps>'
and name like 'X%';
select cats(name,'>10') into :y_clause separated by ' or '
from dictionary.tables
where memname='<data set name in caps>'
and libname='<library name in caps>'
and name like 'Y%';
quit;
You can then write you IF statement as
if &x_clause or &y_clause then...
But in most cases I would go with the original statement, or the MIN and MAX construct, as others have also suggested. The macro option certainly beats entering 100 variable names by hand, and if you have that many variables the performance improvement may be worthwhile.
Post test data in the form of a datastep, as such this is untested:
data want; set have; array x{5}; array y{5}; do i=1 to 5; if x{i} < 10 or y{i} < 10 then...; end; run;
Really depends on what you are doing.
This may be more practical, given that you need any of 10 conditions to be true:
data want;
set have;
array x {5};
array y {5};
condition_met = 0;
do k=1 to 5 until (condition_met=1);
if x{k} < 10 or y{k} > 10 then condition_met=1;
end;
*** Now use CONDITION_MET appropriately;
drop k;
run;
Yes, it really depends on what he is doing. For instance it may be that:
data want; set have; if min(of x:) < 10 or min(of y:) < 10 then ...; run;
Would be sufficient also.
This is one of those cases where you can use math to simplify this.
If any of X1-X5 < 10, is equivalent to checking if the minimum of x1-x15 is less than 10.
Similarly, if any of Y1-Y5> 10 is equivalent to checking if the maximum of Y1 to Y5 is greater than 10.
if min(of x1-x5) < 10 or max(of y1-y5) > 10 then do
@SASSLICK001 wrote:
Hi I have 10 variables in a sas dataset x1-X5 and Y1-Y5, how can write an array to automatically filter the following condition
if (X1<10 or X2<10 or X3<10 or X4<10 or X5<10) or (Y1>10 or Y2>10 or Y3>10 or Y4>10 or Y5>10
Depends on what you want to accomplish. I think that the SAS datastep compiler is reasonably optimized for this kind of processing, meaning that your original statement is probably as efficient as it gets in terms of performance. I think the program will stop checking the following parts as soon as one part of a multiple OR clause has been proven true.
So, if you want fast and dirty, use the construct
if min(of x1-x5)<10 or max(of y1-y5)>10 then...
By "fast and dirty" I mean that this is fast to write, but harder to read than the original statement, and it probably executes a teeny bit slower, as the program will first compare all the variables in each part to each other to find the minimum or maximum, and then compare the result to 10.
If you have a really large number of variables, and you really care about execution time, I would recommend using a macro construct.
One possibility, if you have two groups of variables, prefixed by "X" and "Y", is to use SQL to construct the clauses:
Proc sql noprint;
select cats(name,'<10') into :x_clause separated by ' or '
from dictionary.tables
where memname='<data set name in caps>'
and libname='<library name in caps>'
and name like 'X%';
select cats(name,'>10') into :y_clause separated by ' or '
from dictionary.tables
where memname='<data set name in caps>'
and libname='<library name in caps>'
and name like 'Y%';
quit;
You can then write you IF statement as
if &x_clause or &y_clause then...
But in most cases I would go with the original statement, or the MIN and MAX construct, as others have also suggested. The macro option certainly beats entering 100 variable names by hand, and if you have that many variables the performance improvement may be worthwhile.
The Code:
data have;
input x1 x2 x3 x4 x5 y1 y2 y3 y4 y5;
cards;
1 2 3 4 5 6 7 8 9 0
10 11 12 13 14 15 16 17 18 19 20
;
run;
%macro runit;
data _null_;
set have;
array x_array[*] x:;
array y_array[*] y:;
call symput('x_dim',dim(x_array));
call symput('y_dim',dim(y_array));
stop;
run;
data want;
set have;
if (
%do i=1 %to &x_dim.;
%if &i ne &x_dim. %then
x&i. %str(<10 or);
%else
x&i. %str(<10 );
%end;
) or
(
%do i=1 %to &y_dim.;
%if &i ne &y_dim. %then
y&i. %str(>10 or);
%else
y&i. %str(<10 );
%end;
);
run;
%mend runit;
options mprint;
%runit;
The Log:
MPRINT(RUNIT): data want; MPRINT(RUNIT): set have; MPRINT(RUNIT): if ( x1 <10 or x2 <10 or x3 <10 or x4 <10 or x5 <10 ) or ( y1 >10 or y2 >10 or y3 >10 or y4 >10 or y5 <10 ); MPRINT(RUNIT): run; NOTE: There were 2 observations read from the data set WORK.HAVE. NOTE: The data set WORK.WANT has 2 observations and 10 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
Please let us know if it worked for you.
Thanks all for your amazing solutions
Do remember to mark one of the responses as the answer to the question.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.