BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SASSLICK001
Obsidian | Level 7

Hi I have 10 variables in a sas dataset x1-X5 and Y1-Y5, how can write an array to automatically filter the following condition

 

if (X1<10 or X2<10 or X3<10 or X4<10 or X5<10) or (Y1>10 or Y2>10 or Y3>10 or Y4>10 or Y5>10

1 ACCEPTED SOLUTION

Accepted Solutions
s_lassen
Meteorite | Level 14

Depends on what you want to accomplish. I think that the SAS datastep compiler is reasonably optimized for this kind of processing, meaning that your original statement is probably as efficient as it gets in terms of performance. I think the program will stop checking the following parts as soon as one part of a multiple OR clause has been proven true. 

 

So, if you want fast and dirty, use the construct

if min(of x1-x5)<10 or max(of y1-y5)>10 then...

By "fast and dirty" I mean that this is fast to write, but harder to read than the original statement, and it probably executes a teeny bit slower, as the program will first compare all the variables in each part to each other to find the minimum or maximum, and then compare the result to 10.

 

If you have a really large number of variables, and you really care about execution time, I would recommend using a macro construct.

One possibility, if you have two groups of variables, prefixed by "X" and "Y", is to use SQL to construct the clauses:

Proc sql noprint;
  select cats(name,'<10') into :x_clause separated by ' or '
  from dictionary.tables
  where memname='<data set name in caps>' 
    and libname='<library name in caps>'
    and name like 'X%';
  select cats(name,'>10') into :y_clause separated by ' or '
  from dictionary.tables
  where memname='<data set name in caps>' 
    and libname='<library name in caps>'
    and name like 'Y%';
quit;

You can then write you IF statement as

if &x_clause or &y_clause then...

But in most cases I would go with the original statement, or the MIN and MAX construct, as others have also suggested. The macro option certainly beats entering 100 variable names by hand, and if you have that many variables the performance improvement may be worthwhile. 

View solution in original post

9 REPLIES 9
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Post test data in the form of a datastep, as such this is untested:

data want;
  set have;
  array x{5};
  array y{5};
  do i=1 to 5;
    if x{i} < 10 or y{i} < 10 then...;
  end;
run;

Really depends on what you are doing.

Astounding
PROC Star

This may be more practical, given that you need any  of 10 conditions to be true:

 

data want;
   set have;
   array x {5};
   array y {5};
   condition_met = 0;
   do k=1 to 5 until (condition_met=1);
      if x{k} < 10 or y{k} > 10 then condition_met=1;
   end;
   *** Now use CONDITION_MET appropriately;
   drop k;
run;

 

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yes, it really depends on what he is doing.  For instance it may be that:

data want;
  set have;
  if min(of x:) < 10 or min(of y:) < 10 then ...;
run;

Would be sufficient also.

Reeza
Super User

This is one of those cases where you can use math to simplify this. 

 

If any of X1-X5 < 10, is equivalent to checking if the minimum of x1-x15 is less than 10. 

Similarly, if any of Y1-Y5> 10 is equivalent to checking if the maximum of Y1 to Y5 is greater than 10. 

 

if min(of x1-x5)  < 10 or max(of y1-y5) > 10 then do

@SASSLICK001 wrote:

Hi I have 10 variables in a sas dataset x1-X5 and Y1-Y5, how can write an array to automatically filter the following condition

 

if (X1<10 or X2<10 or X3<10 or X4<10 or X5<10) or (Y1>10 or Y2>10 or Y3>10 or Y4>10 or Y5>10


 

s_lassen
Meteorite | Level 14

Depends on what you want to accomplish. I think that the SAS datastep compiler is reasonably optimized for this kind of processing, meaning that your original statement is probably as efficient as it gets in terms of performance. I think the program will stop checking the following parts as soon as one part of a multiple OR clause has been proven true. 

 

So, if you want fast and dirty, use the construct

if min(of x1-x5)<10 or max(of y1-y5)>10 then...

By "fast and dirty" I mean that this is fast to write, but harder to read than the original statement, and it probably executes a teeny bit slower, as the program will first compare all the variables in each part to each other to find the minimum or maximum, and then compare the result to 10.

 

If you have a really large number of variables, and you really care about execution time, I would recommend using a macro construct.

One possibility, if you have two groups of variables, prefixed by "X" and "Y", is to use SQL to construct the clauses:

Proc sql noprint;
  select cats(name,'<10') into :x_clause separated by ' or '
  from dictionary.tables
  where memname='<data set name in caps>' 
    and libname='<library name in caps>'
    and name like 'X%';
  select cats(name,'>10') into :y_clause separated by ' or '
  from dictionary.tables
  where memname='<data set name in caps>' 
    and libname='<library name in caps>'
    and name like 'Y%';
quit;

You can then write you IF statement as

if &x_clause or &y_clause then...

But in most cases I would go with the original statement, or the MIN and MAX construct, as others have also suggested. The macro option certainly beats entering 100 variable names by hand, and if you have that many variables the performance improvement may be worthwhile. 

SASSLICK001
Obsidian | Level 7
I believe it should be dictionary.columns as it couldn't find "name" column
Satish_Parida
Lapis Lazuli | Level 10

The Code:

 

data have;
input x1 x2 x3 x4 x5 y1 y2 y3 y4 y5;
cards;
1 2 3 4 5 6 7 8 9 0
10 11 12 13 14 15 16 17 18 19 20
;
run;

%macro runit;
	data _null_;
	set have;
	array x_array[*] x:;
	array y_array[*] y:;
	call symput('x_dim',dim(x_array));
	call symput('y_dim',dim(y_array));
	stop;
	run;
	
	data want;
	set have;
	if (
	%do i=1 %to &x_dim.;
		%if &i ne &x_dim. %then
		x&i. %str(<10 or);
		%else
		x&i. %str(<10 );
	%end;
	) or
	(
	%do i=1 %to &y_dim.;
		%if &i ne &y_dim. %then
		y&i. %str(>10 or);
		%else
		y&i. %str(<10 );
	%end;
	);

	run;
%mend runit;
options mprint;
%runit;

The Log:

 

 MPRINT(RUNIT):   data want;
 MPRINT(RUNIT):   set have;
 MPRINT(RUNIT):   if ( x1 <10 or x2 <10 or x3 <10 or x4 <10 or x5 <10 ) or ( y1 >10 or y2 >10 or y3 >10 or y4 >10 or y5 <10 );
 MPRINT(RUNIT):   run;
 
 NOTE: There were 2 observations read from the data set WORK.HAVE.
 NOTE: The data set WORK.WANT has 2 observations and 10 variables.
 NOTE: DATA statement used (Total process time):
       real time           0.00 seconds
       cpu time            0.00 seconds

Please let us know if it worked for you.

SASSLICK001
Obsidian | Level 7

Thanks all for your amazing solutions

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Do remember to mark one of the responses as the answer to the question.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 14243 views
  • 5 likes
  • 6 in conversation