I have a set of variables in a macro and extracted the data for those variables from the main dataset.
I need to select each variable and check for max values in that variables and set it as null.
I created an array and am trying to used do loop to check for max values for each variable in an array.
But i am not able to use the max function inside an array. It says undeclared array referenced:max
How do i do this?
data test;
set libname.maindataset (KEEP=ID &variables);
array var &variables;
do i=1 to dim(var);
if max(var[i]) = 99 then do;
max(var[i])=.
end;
run;
I think that you really need to provide a small example of data and what you are trying to get for output.
Since any variable only has one "maximum" per observation and data step basically processes a single observation at a time what you claim to be attempting is not quite obvious to mean.
MAX as a function requires at least 2 arguments. I suspect that you have errors in your log. When you use max(var[i])=. ; then you have created variable as the max function can not be used in that fashion and since that variable has () it is treated as an array reference.
If you want to find the maximum value of all of the variables with in an array you can use something like:
TempMax = max(of var(*)) ;
where Var is the array name.
You can then find which of those variables with the WHICHN function
PositionInArray = whichn(TempMax, of var(*));
If you know the value that you are searching for such as 99 that could take the place of TempMax as the value to look for.
And set that value to missing with:
Var[PositionInArray] = . ;
If you are replacing multiple values you will need some additional logic.
var1 | var2 | var3 |
23 | 12 | 54 |
94 | 32 | 99 |
99 | 45 | 133 |
this is how my data looks.
yes i am looking for only 99 in each variable.
I need to set 99 in var1 as .(NULL)
Then:
data want; set have; array var{3}; do over var; if var=99 then var=.; end; run;
@mrajendranvasanthi wrote:
var1 var2 var3 23 12 54 94 32 99 99 45 133
this is how my data looks.
yes i am looking for only 99 in each variable.
I need to set 99 in var1 as .(NULL)
What does the entire output look like. Is the 133 supposed to be null for Var3? The 45 for Var2?
It could possibly be a better example if the maximum values of the 3 variable do not all appear on the same row and then show what the result is supposed to look like for that set.
For example if your existing data were to be:
data have;
input var1 var 2 var3;
datalines;
23 12 133
94 45 99
99 32 54
;
run;
What is the desired result?
You could replace the values with . in the example code I just posted to show us.
Sorry
No. if the maximum value of each column is 99 then that 99 should be set to null.
So in this table, it should look like this.
var1 | var2 | var3 |
23 | 12 | 54 |
94 | 32 | 99 |
. | 45 | 133 |
Only the the cell in var 1 with value 99 should be set to null. But in var3 99 is not the max value. so it should be good.
Please let me know if you have any more questions.
Show what the results for Var2 and Var3 should be.
If you only wanted to look at Var1 why bother with an array?
I suspect that the desired result (which you have still not provided for the entire data set) for the example I posted would be:
data Want ;
input var1 var 2 var3;
datalines;
23 12 .
94 . 99
. 32 54
;
run;
Which is going to be a somewhat more complicated problem.
var1 | var2 | var3 |
23 | 12 | 54 |
94 | 32 | 99 |
. | 45 | 133 |
this will be the desired result.
I need an array because i wanted the code to check for maximum value 99 in each variable even when new variables are added in the dataset.
if i wanted to check for just for one specific variable i understand it can be done with normal if statement.
But here the name of the variables might change and new variables can be added into the dataset
Im trying to make it dynamic
Array vars(*) &variables;
if max(of vars(*)) > 99 and whichn(99, of vars(*))>0 then
Do I=1 to dim(vars);
If vars(I) = 99 then call missing(vars(I));
End;
@mrajendranvasanthi wrote:
var1 var2 var3 23 12 54 94 32 99 . 45 133
this will be the desired result.
I need an array because i wanted the code to check for maximum value 99 in each variable even when new variables are added in the dataset.
So why is the value 99 in VAR3 not set to missing in your example?
Here an approach which dynamically determines the variables for the array based on a variable naming pattern.
It then also determines the variables in the array where the max value over the whole data set is 99 and then only loops over these variables to set 99 to missing.
data have;
input var1 var2 var3;
datalines;
23 12 133
94 45 99
99 32 54
;
run;
/* dynamically determine variables for array and number of selected variables */
proc sql noprint;
select
name , count(*)
into :var_list separated by ' ', :n_vars
from dictionary.columns
where libname='WORK' and memname='HAVE' and upcase(name) like 'VAR%'
;
quit;
%let n_vars=%left(&n_vars);
%put &=var_list;
%put &=n_vars;
options symbolgen;
/* determine columns where max value is 99 */
%let elem_99=0;
data _null_;
set have end=last;
array in_vars {&n_vars} &var_list;
array max_vals {&n_vars} 8 _temporary_;
do _i=1 to dim(in_vars);
max_vals{_i}=max(max_vals{_i}, in_vars{_i});
end;
if last then
do;
/* determine columns where max val is 99 */
length _elements $ 1000;
do _i=1 to dim(max_vals);
if max_vals{_i}=99 then _elements=catx(',',_elements,_i);
end;
if not missing(_elements) then call symputx('elem_99',_elements);
end;
run;
%put &=elem_99;
data want(drop=_:);
set have;
array in_vars {&n_vars} &var_list;
/* only loop over variables where max val is 99 */
if "&elem_99" ne "0" then
do;
do _i=&elem_99;
if in_vars{_i}=99 then call missing(in_vars{_i});
end;
end;
run;
Per @ballardw post some example test data (in the form of a datastep) and what the output should look like.
You can also break you logic down into steps:
1) find max values (proc means, proc sql etc. all can do this)
2) merge max values onto the data
3) if value=max value then set it to what you want.
Is this what you need??
data have;
input var1 var2 var3;
datalines;
23 12 133
94 45 99
99 32 54
;
run;
proc means data=have noprint;
var var1 var2 var3;
output out=stats(drop=_type_ _freq_) max=max1 max2 max3;
run;
data want;
if _n_ = 1 then set stats;
set have;
array m {3} max1-max3;
array v {3} var1-var3;
do i = 1 to 3;
if m{i} = 99 and v{i} = m{i} then v{i} = .;
end;
keep var1-var3;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.