DATA Step, Macro, Functions and more

merge datasets and sum

Accepted Solution Solved
Reply
Occasional Contributor hx
Occasional Contributor
Posts: 17
Accepted Solution

merge datasets and sum

Hi,

I got two tables A and B, which look like the following.

 

table A
groupc1c2c3c4
var12.1
each.3.2
value454.
new.16.

 

table B
groupc1c2c3c4
var.131
each.121
home2411
new.211

 

The results I want are a table which merges these two tables by variable group and sum the other variables. Results should be like table C,

table C
groupc1c2c3c4
var1332
each.425
home2411
new.371
value454.

The real datasets are more complicated which contains thousands of observations and 33 variables. Can anyone help me out?
Appreciated


Accepted Solutions
Solution
‎02-22-2017 03:28 PM
Super User
Posts: 7,843

Re: merge datasets and sum

Try

proc sql;
create table c as
select
  coalesce(a.group,b.group) as group,
  sum(a.c1,b.c1) as c1,
  sum(a.c2,b.c2) as c2,
  sum(a.c3,b.c3) as c3,
  sum(a.c4,b.c4) as c4
from a outer join b
  on a.group = b.group
;
quit;

If your variables are indexed like in your example (end with 1 .. 33), dynamically expanding the select list within a macro is rather simple.

If not, you may have to read the variable names from sashelp.vcolumn so you can use call execute to create the SQL code dynamically.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers

View solution in original post


All Replies
Solution
‎02-22-2017 03:28 PM
Super User
Posts: 7,843

Re: merge datasets and sum

Try

proc sql;
create table c as
select
  coalesce(a.group,b.group) as group,
  sum(a.c1,b.c1) as c1,
  sum(a.c2,b.c2) as c2,
  sum(a.c3,b.c3) as c3,
  sum(a.c4,b.c4) as c4
from a outer join b
  on a.group = b.group
;
quit;

If your variables are indexed like in your example (end with 1 .. 33), dynamically expanding the select list within a macro is rather simple.

If not, you may have to read the variable names from sashelp.vcolumn so you can use call execute to create the SQL code dynamically.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Occasional Contributor hx
Occasional Contributor
Posts: 17

Re: merge datasets and sum

Posted in reply to KurtBremser
thank you kurtbremser, I think I have found a way to get my results.
Valued Guide
Posts: 505

Re: merge datasets and sum

 

Proc summary is multi-threaded so this should be fast, I also combine data using a view

data tablea;
input group $ c1 c2 c3 c4;
cards4;
var 1 2 . 1
each . 3 . 2
value 4 5 4 .
new . 1 6 .
;;;;
run;quit;
data tableb;
input group $ c1 c2 c3 c4;
cards4;
var . 1 3 1
each . 1 2 1
home 2 4 1 1
new . 2 1 1
;;;;
run;quit;
data tableab/view=tableab;
set tablea tableb;
run;quit;
proc summary data=tableab sum;
class group;
var _numeric_;
output out=tablec sum=;
run;quit;
 
Contributor
Posts: 20

Re: merge datasets and sum

Posted in reply to rogerjdeangelis

Can't we use datastep merge statement to accomplish the same..??

Super User
Posts: 7,843

Re: merge datasets and sum

Posted in reply to Bhargav_Movva

Bhargav_Movva wrote:

Can't we use datastep merge statement to accomplish the same..??


Of course you can. But then you have to rename the variables of one dataset, so you can then use the coalesce function on the originally named variables from dataset A and the renamed variables from B. The data step does not have the a. and b. notation for variables that SQL has, and without a rename the values would simply overwrite in a not very predictable manner.

Assume that the c1-c33 notation is in effect and the datasets are sorted by id:

%macro merge_it;
data C;
merge
  a
  b (rename=(
%do i = 1 %to 33;
  c&i.=_c&i.
%end;
  ))
;
by id;
%do i = 1 %to 33;
c&i. = coalesce(c&i.,_c&i.);
drop _c&i.;
%end;
run;
%mend;
%merge_it

A similar macro for the proc SQL would be simpler to write. Which way you go could be determined by performance in case of large datasets (SQL can be quite bad there).

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 448 views
  • 1 like
  • 4 in conversation