Fluorite | Level 6

## Conditional perform sum by group

Hi,

I have question that how to conditional perform vertical sum as following:

Data have:

 ID VAR1 VAR2 COE1 A 2 4 0.2 A NA/. NA/. 0.3 A 1 3 0.5 B 4 6 0.1 B 7 9 0.2 B 1 3 0.7 C 5 6 0.5 C 2 3 0.5

I want to do: if there is NA or dot  in var1 or var2, then sum the rest vertically by ID, otherwise calculate the sum product between Var1, Var2 and COE 1.

Data want:

 ID VAR1 VAR2 A 3 7 B 2.5 4.5 C 3.5 4.5

where in data want,

Var1(A) = 2+1, Var2(A) =4+3;

Var1(B) =4*0.1+7*0.2+1*0.7=2.5 Var2(B) = 6*0.1+9*0.2+3*0.7=4.5;

Var1(C) = 5*0.5+2*0.5 = 3.5, Var2(C) = 6*0.5+3*0.5 = 4.5

Thanks!

7 REPLIES 7
Ammonite | Level 13

## Re: Conditional perform sum by group

here is a data step that gives your requested results

``````data have;
input ID \$ var1 var2 coe1;
cards;

A 2 4 0.2
A NA/. NA/. 0.3
A 1 3 0.5
B 4 6 0.1
B 7 9 0.2
B 1 3 0.7
C 5 6 0.5
C 2 3 0.5
;
proc sort data=have out=have2;
by ID;
run;

data want(drop = var1_total var2_total coe1);
set have2;
by id;
if first.ID then
do;
var1_total = 0;
var2_total = 0;
end;
var1_total + var1;
var2_total + var2;
if last.ID then
do;
var1 = var1_total;
var2 = var2_total;
output;
end;
run;

``````
Fluorite | Level 6

## Re: Conditional perform sum by group

Thanks for your suggestion. But it gives the sum for B C, instead of the sum product. By sum product I mean:

Var1(B)*COE(B) + VAR2(B)*COE(B).

Thanks!

## Re: Conditional perform sum by group

Lets solve the issue with the input data first: the string "NA/." can't be stored in a numeric variable. var1 and var2 must be numeric to be used in calculations. So please post the example data as data-step using the data-types you have in your original dataset.

Except for the special treatment when var1 or var2 are missing in a group, this looks like weighted sums.

Fluorite | Level 6

## Re: Conditional perform sum by group

Sorry for the confusion here. What I mean by "NA/." is actually missing values and there are almost half of my var1 and var2 are missing. So basically I want to perform:

For each ID, if there is missing value in Var1 or Var2, then sum those non-missing value up, if there are no missing value, then calculate the weighted sum with corresponding COE1.

## Re: Conditional perform sum by group

Not 100% that this works, but it seems to provide the requested results:

``````proc sql;
/* find all IDs having var1 and/or var2 missing */
create table work.WithMissing as
select distinct id
from work.have2
where missing(var1) or missing(var2)
;
/* set coe1 to 1 for the IDs */
update work.have2
set coe1 = 1
where id in (select id from work.WithMissing)
;
quit;

proc summary data=work.have2 nway;
class id;
var var1 var2;
weight coe1;
output out=work.want(drop= _type_ _freq_) sum=;
run;``````
Meteorite | Level 14

## Re: Conditional perform sum by group

It can be done using SQL:

``````proc sql;
create table want as select
ID,
case nmiss(var1)+nmiss(var2)
when 0 then sum(var1*coe1)
else sum(var1)
end as var1,
case nmiss(var1)+nmiss(var2)
when 0 then sum(var2*coe1)
else sum(var2)
end as var2
from have
group by id;
quit; ``````

Or you can do it with datasteps like this:

``````data miss;
set have(where=(var1 is null or var2 is null))
by id;
if first.id;
keep id;
run;

data want;
sum1=0;
sum2=0;
do until(last.id);
merge miss(in=miss) have;
by id;
if miss then do;
sum1+var1;
sum2+var2;
end;
else do;
sum1+var1*coe1;
sum2+var2*coe1;
end;
end;
drop sum1 sum2;
var1=sum1;
var2=sum2;
run;``````

PROC Star

## Re: Conditional perform sum by group

Assuming that your variables are actually numeric, and that your data is sorted ...

Why not just compute both and track which one you want to use.  For example:

data want;

var1_sum=0;

var1_weighted=0;

var2_sum=0;

var2_weighted=0;

do until (last.id);

set have;

by id;

var1_sum + var1;

var2_sum + var2;

if var1 > . then var1_weighted + var1*coe1;

else var1_missing=1;

if var2 > . then var2_weighted + var2*coe1;

else var2_missing=1;

end;

if var1_missing then var1=var1_sum;

else var1= var1_weighted;

if var2_missing then var2=var2_sum;

else var2=var2_weighted;

drop var1_: var2_: ;

run;

Discussion stats
• 7 replies
• 1817 views
• 4 likes
• 5 in conversation