Hello,
I have a large number of variables identifying some data characteristics (frequency of environmental factors identified), and a total cost variable. I'd like to get average costs for each of the characteristics where that characteristic is not missing.
I'm having trouble writing a code that loops through each and creates an average. For the output, I would like a table that lists the average cost (total cost divided by lines where characteristic is not missing) for each characteristic, i.e. average cost where a1 is present, average cost where a2 is present, etc.
Thanks in advance for any help!
data have;
infile datalines;
input a1 1. a2 1. a3 1. cost 10.;
datalines;
1 . 3 100
4 6 1 200
. 2 1 300
2 . . 400
2 2 2 500;
run;
*test out sql step;
proc sql;
create table averages as
select avg(cost) as avgcost1
from have
where a1 ne .
;quit;
* Result is 1200/4=300;
*convert to macro;
%macro avg(i);
proc sql;
create table averages as
select avg(cost) as avgcost&i
from have
where a&i ne .
;quit;
%mend;
%avg(1);
*array step using macro;
data temp;
set have;
array a_s {3} a:;
do i = 1 to 3;
%avg%i;
end;
run;
I think a better idea is to use an array to create a new set of COST_A variables where the value of COST_A is missing if A is missing and has the value of COST if A is not missing. Then means can be computed by PROC SUMMARY or PROC MEANS.
data have;
infile datalines;
input a1 a2 a3 cost;
datalines;
1 . 3 100
4 6 1 200
. 2 1 300
2 . . 400
2 2 2 500
;
run;
data have1;
set have;
array a a1-a3;
array cost_a cost_a1-cost_a3;
do i=1 to dim(a);
if not missing(a(i)) then cost_a(i)=cost;
end;
drop i;
run;
proc summary data=have1;
var cost_a:;
output out=stats mean=/autoname;
run;
Adding: what you are trying to do in your code is insert a PROC (in this case PROC SQL) inside a DATA step. This can't be done. This is invalid syntax. It doesn't matter if you try to do it with a macro or without a macro, you can't put a PROC inside a DATA step.
I think a better idea is to use an array to create a new set of COST_A variables where the value of COST_A is missing if A is missing and has the value of COST if A is not missing. Then means can be computed by PROC SUMMARY or PROC MEANS.
data have;
infile datalines;
input a1 a2 a3 cost;
datalines;
1 . 3 100
4 6 1 200
. 2 1 300
2 . . 400
2 2 2 500
;
run;
data have1;
set have;
array a a1-a3;
array cost_a cost_a1-cost_a3;
do i=1 to dim(a);
if not missing(a(i)) then cost_a(i)=cost;
end;
drop i;
run;
proc summary data=have1;
var cost_a:;
output out=stats mean=/autoname;
run;
Adding: what you are trying to do in your code is insert a PROC (in this case PROC SQL) inside a DATA step. This can't be done. This is invalid syntax. It doesn't matter if you try to do it with a macro or without a macro, you can't put a PROC inside a DATA step.
@PaigeMiller , that works great, thank you very much!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.