I have a dataset looking like this:
data test;
array var(10) var1-var10;
cards;
1 2 3 4 5 4 3 2 1 1
. . . . . . . . . . .
;
run;
I want to find the maximum value in the array (in this case 5) and average the 3 values around it (the max value, max-1 and max+1 - in the example 4,5 and 4)
Thanks!
data want;
set test;
array v var1-var10;
maximum_value=max(of var1-var10);
loc=whichn(maximum_value,of var1-var10);
avg_of_3=mean(v(loc-1),v(loc),v(loc+1));
run;
Note: this code fails if the maximum of var1-var10 is in position 1 or position 10. If the maximum appears at two or more locations, it will find only the first such occurrence of the maximum and work with that.
data want;
set test;
array v var1-var10;
maximum_value=max(of var1-var10);
loc=whichn(maximum_value,of var1-var10);
avg_of_3=mean(v(loc-1),v(loc),v(loc+1));
run;
Note: this code fails if the maximum of var1-var10 is in position 1 or position 10. If the maximum appears at two or more locations, it will find only the first such occurrence of the maximum and work with that.
Thank you - works perfectly! Sorry about the code. Here is the complete code including yours:
data test;
input var1-var10;
cards;
1 2 3 4 5 4 3 2 1 1
1 3 4 5 6 3 5 2 2 1
;
run;
data want;
set test;
array v var1-var10;
maximum_value=max(of var1-var10);
loc=whichn(maximum_value,of var1-var10);
avg_of_3=mean(v(loc-1),v(loc),v(loc+1));
run;
You can fix the boundary condition by adding two extra variables (with missing values) to the ARRAY. You can use just one variable and add it twice to the array.
The result be the average of the TWO values at the boundary since the third value will be missing.
data want;
set test;
array v var0 var1-var10 var0;
maximum_value=max(of v[*]);
loc=whichn(maximum_value,of v[*]);
avg_of_3=mean(v(loc-1),v(loc),v(loc+1));
drop var0 ;
run;
@Tom wrote:
You can fix the boundary condition by adding two extra variables (with missing values) to the ARRAY. You can use just one variable and add it twice to the array.
The result be the average of the TWO values at the boundary since the third value will be missing.
True if that's what the OP wanted. However, no such statement from the OP has been made, so I didn't try to code anything for boundary conditions.
I did something similar (see code below) but your solution is more elegant. Thanks! I know I didn't ask for this, but just because I didn't think of that problem in the beginning. I appreciate all the help.
data want;
set test;
array v var1-var10;
maximum_value=max(of var1-var10);
loc=whichn(maximum_value,of var1-var10);
if loc ge 2 and loc le 9 then avg_of_3=mean(v(loc-1),v(loc),v(loc+1));
if loc=1 then avg_of_3=mean(v(loc),v(loc+1));
if loc=10 then avg_of_3=mean(v(loc),v(loc-1));
run;
Looks good to me! Good for you to make the fix by yourself.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.