Using Max Function inside a DO loop to find the max of each variable in an array

Reply
Occasional Contributor
Posts: 6

Using Max Function inside a DO loop to find the max of each variable in an array

[ Edited ]

I have a set of variables in a macro and extracted the data for those variables from the main dataset. 

 

I need to select each variable and check for max values in that variables and set it as null. 

 

I created an array and am trying to used do loop to check for max values for each variable in an array.

 

But i am not able to use the max function inside an array. It says undeclared array referenced:max

 

How do i do this?

 

data test;
set libname.maindataset (KEEP=ID &variables);

array var &variables;
do i=1 to dim(var);
if max(var[i]) = 99 then do;
max(var[i])=. 
end;
run;

Super User
Posts: 11,134

Re: Using Max Function inside a DO loop to find the max of each variable in an array

I think that you really need to provide a small example of data and what you are trying to get for output.

Since any variable only has one "maximum" per observation and data step basically processes a single observation at a time what you claim to be attempting is not quite obvious to mean.

 

MAX as a function requires at least 2 arguments. I suspect that you have errors in your log. When you use max(var[i])=. ; then you have created variable as the max function can not be used in that fashion and since that variable has () it is treated as an array reference.

 

If you want to find the maximum value of all of the variables with in an array you can use something like:

   TempMax = max(of var(*)) ;

where Var is the array name.

You can then find which of those variables with the WHICHN function

   PositionInArray = whichn(TempMax, of var(*));

If you know the value that you are searching for  such as 99 that could take the place of TempMax as the value to look for.

And set that value to missing with:

  Var[PositionInArray] = . ;

 

If you are replacing multiple values you will need some additional logic.

Occasional Contributor
Posts: 6

Re: Using Max Function inside a DO loop to find the max of each variable in an array

[ Edited ]

 

var1 var2 var3
23 12 54
94 32 99
99 45 133

 

this is how my data looks.

 

yes i am looking for only 99 in each variable.

 

I need to set 99 in var1 as .(NULL)

Super User
Super User
Posts: 7,720

Re: Using Max Function inside a DO loop to find the max of each variable in an array

Then:

data want;
  set have;
  array var{3};
  do over var;
    if var=99 then var=.;
  end;
run;
Super User
Posts: 11,134

Re: Using Max Function inside a DO loop to find the max of each variable in an array


mrajendranvasanthi wrote:

 

var1 var2 var3
23 12 54
94 32 99
99 45 133

 

this is how my data looks.

 

yes i am looking for only 99 in each variable.

 

I need to set 99 in var1 as .(NULL)


What does the entire output look like. Is the 133 supposed to be null for Var3? The 45 for Var2?

It could possibly be a better example if the maximum values of the 3 variable do not all appear on the same row and then show what the result is supposed to look like for that set.

 

For example if your existing data were to be:

data have;
  input var1 var 2 var3;
datalines;
23   12   133
94   45   99
99   32   54
;
run;

What is the desired result?

 

You could replace the values with . in the example code I just posted to show us.

Occasional Contributor
Posts: 6

Re: Using Max Function inside a DO loop to find the max of each variable in an array

[ Edited ]

Sorry

 

No. if the maximum value of each column is 99 then that 99 should be set to null.

 

So in this table, it should look like this.

 

var1 var2 var3
23 12 54
94 32 99
            . 45 133

 

Only the the cell in var 1 with value 99 should be set to null. But in var3 99 is not the max value. so it should be good.

 

Please let me know if you have any more questions.

 

 

Super User
Posts: 11,134

Re: Using Max Function inside a DO loop to find the max of each variable in an array

Show what the results for Var2 and Var3 should be.

 

If you only wanted to look at Var1 why bother with an array?

 

I suspect that the desired result (which you have still not provided for the entire data set) for the example I posted would be:

data Want ;
  input var1 var 2 var3;
datalines;
23   12   .
94   .   99
.   32   54
;
run;

Which is going to be a somewhat more complicated problem.

 

Occasional Contributor
Posts: 6

Re: Using Max Function inside a DO loop to find the max of each variable in an array

[ Edited ]
var1 var2 var3
23 12 54
94 32 99
           . 45 133

 

this will be the desired result. 

 

I need an array because i wanted the code to check for maximum value 99 in each variable even when new variables are added in the dataset.

 

if i wanted to check for just for one specific variable i understand it can be done with normal if statement.

 

But here the name of the variables might change and new variables can be added into the dataset

 

Im trying to make it dynamic

Super User
Posts: 19,167

Re: Using Max Function inside a DO loop to find the max of each variable in an array

 

 

Array vars(*) &variables;

 

if max(of vars(*)) > 99 and whichn(99, of vars(*))>0 then

    Do I=1 to dim(vars);

        If vars(I) = 99 then call missing(vars(I));

      End;

 

 

 

Super User
Posts: 11,134

Re: Using Max Function inside a DO loop to find the max of each variable in an array


mrajendranvasanthi wrote:
var1 var2 var3
23 12 54
94 32 99
           . 45 133

 

this will be the desired result. 

 

I need an array because i wanted the code to check for maximum value 99 in each variable even when new variables are added in the dataset.

 

 


So why is the value 99 in VAR3 not set to missing in your example?

 

Respected Advisor
Posts: 4,137

Re: Using Max Function inside a DO loop to find the max of each variable in an array

Here an approach which dynamically determines the variables for the array based on a variable naming pattern.

It then also determines the variables in the array where the max value over the whole data set is 99 and then only loops over these variables to set 99 to missing.

data have;
  input var1 var2 var3;
datalines;
23   12   133
94   45   99
99   32   54
;
run;

/* dynamically determine variables for array and number of selected variables */
proc sql noprint;
  select 
    name , count(*)
      into :var_list separated by ' ', :n_vars          
  from dictionary.columns
  where libname='WORK' and memname='HAVE' and upcase(name) like 'VAR%'
  ;
quit;
%let n_vars=%left(&n_vars);
%put &=var_list;
%put &=n_vars;
options symbolgen;

/* determine columns where max value is 99 */
%let elem_99=0;
data _null_;
  set have end=last;
  array in_vars {&n_vars}  &var_list;
  array max_vals {&n_vars} 8 _temporary_;

  do _i=1 to dim(in_vars);
    max_vals{_i}=max(max_vals{_i}, in_vars{_i});
  end;
  
  if last then
    do;
      /* determine columns where max val is 99 */
      length _elements $ 1000;
      do _i=1 to dim(max_vals);
        if max_vals{_i}=99 then _elements=catx(',',_elements,_i);
      end;
      if not missing(_elements) then call symputx('elem_99',_elements);
    end;
run;

%put &=elem_99;


data want(drop=_:);
  set have;
  array in_vars {&n_vars}  &var_list;
  /* only loop over variables where max val is 99 */
  if "&elem_99" ne "0" then
  do;
    do _i=&elem_99;
      if in_vars{_i}=99 then call missing(in_vars{_i});
    end;
  end;

run;

 

Super User
Super User
Posts: 7,720

Re: Using Max Function inside a DO loop to find the max of each variable in an array

Per @ballardw post some example test data (in the form of a datastep) and what the output should look like.

 

You can also break you logic down into steps:

1) find max values (proc means, proc sql etc. all can do this)

2) merge max values onto the data

3) if value=max value then set it to what you want.

 

Contributor
Posts: 20

Re: Using Max Function inside a DO loop to find the max of each variable in an array

Is this what you need??

 

data have;

input var1 var2 var3;

datalines;

23 12 133

94 45 99

99 32 54

;

run;

proc means data=have noprint;

var var1 var2 var3;

output out=stats(drop=_type_ _freq_) max=max1 max2 max3;

run;

 

data want;

if _n_ = 1 then set stats;

set have;

array m {3} max1-max3;

array v {3} var1-var3;

do i = 1 to 3;

if m{i} = 99 and v{i} = m{i} then v{i} = .;

end;

keep var1-var3;

run;

Ask a Question
Discussion stats
  • 12 replies
  • 1262 views
  • 0 likes
  • 6 in conversation