BookmarkSubscribeRSS Feed
mrajendranvasanthi
Fluorite | Level 6

I have a set of variables in a macro and extracted the data for those variables from the main dataset. 

 

I need to select each variable and check for max values in that variables and set it as null. 

 

I created an array and am trying to used do loop to check for max values for each variable in an array.

 

But i am not able to use the max function inside an array. It says undeclared array referenced:max

 

How do i do this?

 

data test;
set libname.maindataset (KEEP=ID &variables);

array var &variables;
do i=1 to dim(var);
if max(var[i]) = 99 then do;
max(var[i])=. 
end;
run;

12 REPLIES 12
ballardw
Super User

I think that you really need to provide a small example of data and what you are trying to get for output.

Since any variable only has one "maximum" per observation and data step basically processes a single observation at a time what you claim to be attempting is not quite obvious to mean.

 

MAX as a function requires at least 2 arguments. I suspect that you have errors in your log. When you use max(var[i])=. ; then you have created variable as the max function can not be used in that fashion and since that variable has () it is treated as an array reference.

 

If you want to find the maximum value of all of the variables with in an array you can use something like:

   TempMax = max(of var(*)) ;

where Var is the array name.

You can then find which of those variables with the WHICHN function

   PositionInArray = whichn(TempMax, of var(*));

If you know the value that you are searching for  such as 99 that could take the place of TempMax as the value to look for.

And set that value to missing with:

  Var[PositionInArray] = . ;

 

If you are replacing multiple values you will need some additional logic.

mrajendranvasanthi
Fluorite | Level 6

 

var1 var2 var3
23 12 54
94 32 99
99 45 133

 

this is how my data looks.

 

yes i am looking for only 99 in each variable.

 

I need to set 99 in var1 as .(NULL)

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Then:

data want;
  set have;
  array var{3};
  do over var;
    if var=99 then var=.;
  end;
run;
ballardw
Super User

@mrajendranvasanthi wrote:

 

var1 var2 var3
23 12 54
94 32 99
99 45 133

 

this is how my data looks.

 

yes i am looking for only 99 in each variable.

 

I need to set 99 in var1 as .(NULL)


What does the entire output look like. Is the 133 supposed to be null for Var3? The 45 for Var2?

It could possibly be a better example if the maximum values of the 3 variable do not all appear on the same row and then show what the result is supposed to look like for that set.

 

For example if your existing data were to be:

data have;
  input var1 var 2 var3;
datalines;
23   12   133
94   45   99
99   32   54
;
run;

What is the desired result?

 

You could replace the values with . in the example code I just posted to show us.

mrajendranvasanthi
Fluorite | Level 6

Sorry

 

No. if the maximum value of each column is 99 then that 99 should be set to null.

 

So in this table, it should look like this.

 

var1 var2 var3
23 12 54
94 32 99
            . 45 133

 

Only the the cell in var 1 with value 99 should be set to null. But in var3 99 is not the max value. so it should be good.

 

Please let me know if you have any more questions.

 

 

ballardw
Super User

Show what the results for Var2 and Var3 should be.

 

If you only wanted to look at Var1 why bother with an array?

 

I suspect that the desired result (which you have still not provided for the entire data set) for the example I posted would be:

data Want ;
  input var1 var 2 var3;
datalines;
23   12   .
94   .   99
.   32   54
;
run;

Which is going to be a somewhat more complicated problem.

 

mrajendranvasanthi
Fluorite | Level 6
var1 var2 var3
23 12 54
94 32 99
           . 45 133

 

this will be the desired result. 

 

I need an array because i wanted the code to check for maximum value 99 in each variable even when new variables are added in the dataset.

 

if i wanted to check for just for one specific variable i understand it can be done with normal if statement.

 

But here the name of the variables might change and new variables can be added into the dataset

 

Im trying to make it dynamic

Reeza
Super User

 

 

Array vars(*) &variables;

 

if max(of vars(*)) > 99 and whichn(99, of vars(*))>0 then

    Do I=1 to dim(vars);

        If vars(I) = 99 then call missing(vars(I));

      End;

 

 

 

ballardw
Super User

@mrajendranvasanthi wrote:
var1 var2 var3
23 12 54
94 32 99
           . 45 133

 

this will be the desired result. 

 

I need an array because i wanted the code to check for maximum value 99 in each variable even when new variables are added in the dataset.

 

 


So why is the value 99 in VAR3 not set to missing in your example?

 

Patrick
Opal | Level 21

Here an approach which dynamically determines the variables for the array based on a variable naming pattern.

It then also determines the variables in the array where the max value over the whole data set is 99 and then only loops over these variables to set 99 to missing.

data have;
  input var1 var2 var3;
datalines;
23   12   133
94   45   99
99   32   54
;
run;

/* dynamically determine variables for array and number of selected variables */
proc sql noprint;
  select 
    name , count(*)
      into :var_list separated by ' ', :n_vars          
  from dictionary.columns
  where libname='WORK' and memname='HAVE' and upcase(name) like 'VAR%'
  ;
quit;
%let n_vars=%left(&n_vars);
%put &=var_list;
%put &=n_vars;
options symbolgen;

/* determine columns where max value is 99 */
%let elem_99=0;
data _null_;
  set have end=last;
  array in_vars {&n_vars}  &var_list;
  array max_vals {&n_vars} 8 _temporary_;

  do _i=1 to dim(in_vars);
    max_vals{_i}=max(max_vals{_i}, in_vars{_i});
  end;
  
  if last then
    do;
      /* determine columns where max val is 99 */
      length _elements $ 1000;
      do _i=1 to dim(max_vals);
        if max_vals{_i}=99 then _elements=catx(',',_elements,_i);
      end;
      if not missing(_elements) then call symputx('elem_99',_elements);
    end;
run;

%put &=elem_99;


data want(drop=_:);
  set have;
  array in_vars {&n_vars}  &var_list;
  /* only loop over variables where max val is 99 */
  if "&elem_99" ne "0" then
  do;
    do _i=&elem_99;
      if in_vars{_i}=99 then call missing(in_vars{_i});
    end;
  end;

run;

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Per @ballardw post some example test data (in the form of a datastep) and what the output should look like.

 

You can also break you logic down into steps:

1) find max values (proc means, proc sql etc. all can do this)

2) merge max values onto the data

3) if value=max value then set it to what you want.

 

Tommywhosc
Obsidian | Level 7

Is this what you need??

 

data have;

input var1 var2 var3;

datalines;

23 12 133

94 45 99

99 32 54

;

run;

proc means data=have noprint;

var var1 var2 var3;

output out=stats(drop=_type_ _freq_) max=max1 max2 max3;

run;

 

data want;

if _n_ = 1 then set stats;

set have;

array m {3} max1-max3;

array v {3} var1-var3;

do i = 1 to 3;

if m{i} = 99 and v{i} = m{i} then v{i} = .;

end;

keep var1-var3;

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 6802 views
  • 0 likes
  • 6 in conversation