## merge datasets and sum

# merge datasets and sum

Hi,

I got two tables A and B, which look like the following.

 table A group c1 c2 c3 c4 var 1 2 . 1 each . 3 . 2 value 4 5 4 . new . 1 6 .

 table B group c1 c2 c3 c4 var . 1 3 1 each . 1 2 1 home 2 4 1 1 new . 2 1 1

The results I want are a table which merges these two tables by variable group and sum the other variables. Results should be like table C,

 table C group c1 c2 c3 c4 var 1 3 3 2 each . 4 2 5 home 2 4 1 1 new . 3 7 1 value 4 5 4 .

The real datasets are more complicated which contains thousands of observations and 33 variables. Can anyone help me out?
Appreciated

‎02-22-2017 03:28 PM
‎02-22-2017 03:28 PM
## Re: merge datasets and sum

thank you kurtbremser, I think I have found a way to get my results.
## Re: merge datasets and sum

``````Proc summary is multi-threaded so this should be fast, I also combine data using a view

data tablea;
input group \$ c1 c2 c3 c4;
cards4;
var 1 2 . 1
each . 3 . 2
value 4 5 4 .
new . 1 6 .
;;;;
run;quit;
data tableb;
input group \$ c1 c2 c3 c4;
cards4;
var . 1 3 1
each . 1 2 1
home 2 4 1 1
new . 2 1 1
;;;;
run;quit;
data tableab/view=tableab;
set tablea tableb;
run;quit;
proc summary data=tableab sum;
class group;
var _numeric_;
output out=tablec sum=;
run;quit;
``````
## Re: merge datasets and sum

Can't we use datastep merge statement to accomplish the same..??

## Re: merge datasets and sum

Bhargav_Movva wrote:

Can't we use datastep merge statement to accomplish the same..??

Of course you can. But then you have to rename the variables of one dataset, so you can then use the coalesce function on the originally named variables from dataset A and the renamed variables from B. The data step does not have the a. and b. notation for variables that SQL has, and without a rename the values would simply overwrite in a not very predictable manner.

Assume that the c1-c33 notation is in effect and the datasets are sorted by id:

``````%macro merge_it;
data C;
merge
a
b (rename=(
%do i = 1 %to 33;
c&i.=_c&i.
%end;
))
;
by id;
%do i = 1 %to 33;
c&i. = coalesce(c&i.,_c&i.);
drop _c&i.;
%end;
run;
%mend;
%merge_it``````

A similar macro for the proc SQL would be simpler to write. Which way you go could be determined by performance in case of large datasets (SQL can be quite bad there).

