Hi,
I got two tables A and B, which look like the following.
table A | ||||
group | c1 | c2 | c3 | c4 |
var | 1 | 2 | . | 1 |
each | . | 3 | . | 2 |
value | 4 | 5 | 4 | . |
new | . | 1 | 6 | . |
table B | ||||
group | c1 | c2 | c3 | c4 |
var | . | 1 | 3 | 1 |
each | . | 1 | 2 | 1 |
home | 2 | 4 | 1 | 1 |
new | . | 2 | 1 | 1 |
The results I want are a table which merges these two tables by variable group and sum the other variables. Results should be like table C,
table C | ||||
group | c1 | c2 | c3 | c4 |
var | 1 | 3 | 3 | 2 |
each | . | 4 | 2 | 5 |
home | 2 | 4 | 1 | 1 |
new | . | 3 | 7 | 1 |
value | 4 | 5 | 4 | . |
The real datasets are more complicated which contains thousands of observations and 33 variables. Can anyone help me out?
Appreciated
Try
proc sql;
create table c as
select
coalesce(a.group,b.group) as group,
sum(a.c1,b.c1) as c1,
sum(a.c2,b.c2) as c2,
sum(a.c3,b.c3) as c3,
sum(a.c4,b.c4) as c4
from a outer join b
on a.group = b.group
;
quit;
If your variables are indexed like in your example (end with 1 .. 33), dynamically expanding the select list within a macro is rather simple.
If not, you may have to read the variable names from sashelp.vcolumn so you can use call execute to create the SQL code dynamically.
Try
proc sql;
create table c as
select
coalesce(a.group,b.group) as group,
sum(a.c1,b.c1) as c1,
sum(a.c2,b.c2) as c2,
sum(a.c3,b.c3) as c3,
sum(a.c4,b.c4) as c4
from a outer join b
on a.group = b.group
;
quit;
If your variables are indexed like in your example (end with 1 .. 33), dynamically expanding the select list within a macro is rather simple.
If not, you may have to read the variable names from sashelp.vcolumn so you can use call execute to create the SQL code dynamically.
Proc summary is multi-threaded so this should be fast, I also combine data using a view
data tablea;
input group $ c1 c2 c3 c4;
cards4;
var 1 2 . 1
each . 3 . 2
value 4 5 4 .
new . 1 6 .
;;;;
run;quit;
data tableb;
input group $ c1 c2 c3 c4;
cards4;
var . 1 3 1
each . 1 2 1
home 2 4 1 1
new . 2 1 1
;;;;
run;quit;
data tableab/view=tableab;
set tablea tableb;
run;quit;
proc summary data=tableab sum;
class group;
var _numeric_;
output out=tablec sum=;
run;quit;
Can't we use datastep merge statement to accomplish the same..??
@Bhargav_Movva wrote:
Can't we use datastep merge statement to accomplish the same..??
Of course you can. But then you have to rename the variables of one dataset, so you can then use the coalesce function on the originally named variables from dataset A and the renamed variables from B. The data step does not have the a. and b. notation for variables that SQL has, and without a rename the values would simply overwrite in a not very predictable manner.
Assume that the c1-c33 notation is in effect and the datasets are sorted by id:
%macro merge_it;
data C;
merge
a
b (rename=(
%do i = 1 %to 33;
c&i.=_c&i.
%end;
))
;
by id;
%do i = 1 %to 33;
c&i. = coalesce(c&i.,_c&i.);
drop _c&i.;
%end;
run;
%mend;
%merge_it
A similar macro for the proc SQL would be simpler to write. Which way you go could be determined by performance in case of large datasets (SQL can be quite bad there).
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.