I've two data sets I need to use SQL to stack up or join after doing the count (# of records). I then need to sum up the count from data set 1 and the count from data set 2. Since the data sets are really huge, I'm trying to do in fewer steps if possible. Here is the code: I used UNION to join but it would only read in the count variable from data 1, not the count from data 2. If I use OUTER UNION Corr, it will treat those from data 1 and data 2 as two records, naturally. I was hoping there is a way SAS SQL can smartly consolidate the two cases. I know I can output the count-processed data for data 1, and then for data 2, and then create data 3 through FULL OUTER JOIN but I was hoping not to output the intermediate steps to cut my data running time. Is there a way? Thanks in advance.
data d1; input id1 id2;
datalines;
1 11
2 12
2 12
3 11
3 14
3 14
;
data d2; input id1 id2;
datalines;
1 11
4 15
4 15
;
proc sql; create table d3 as
select distinct id1, id2, count(*) as C1 from d1 group by id1, id2
outer union corr
select distinct id1, id2, count(*) as C2 from d2 group by id1, id2
order by id1, id2;
quit;
proc print data=d1; run;
proc print data=d2; run;
proc print data=d3; run;
Here is what I got if I use UNION or OUTER UNION CORR.
If using UNION | If using OUTER UNION CORR | ||||||||
Obs | id1 | id2 | C1 | Obs | id1 | id2 | C1 | C2 | |
1 | 1 | 11 | 1 | 1 | 1 | 11 | . | 1 | |
2 | 2 | 12 | 2 | 2 | 1 | 11 | 1 | . | |
3 | 3 | 11 | 1 | 3 | 2 | 12 | 2 | . | |
4 | 3 | 14 | 2 | 4 | 3 | 11 | 1 | . | |
5 | 4 | 15 | 2 | 5 | 3 | 14 | 2 | . | |
6 | 4 | 15 | . | 2 |
Here is the data I want.
Obs | id1 | id2 | C1 | C2 |
1 | 1 | 11 | 1 | 1 |
3 | 2 | 12 | 2 | . |
4 | 3 | 11 | 1 | . |
5 | 3 | 14 | 2 | . |
6 | 4 | 15 | . | 2 |
For a SQL guru, this should be easy. Until one replies, you can just follow a slightly different path:
Actually create summaries of d1 and d2, such as:
proc sql;
create table d1counts as select id1, id2, count(*) as c1 from d1 group by id1, id2;
create table d2counts as select id1, id2, count(*) as c2 from d2 group by id1, id2;
quit;
It should be possible to create d1counts and d2counts as views rather than data sets.
Then put them together:
data d3;
merge d1counts d2counts;
by id1 id2;
run;
You want a full join:
proc sql;
create table d3 as
select
coalesce(a.id1, b.id1) as id1,
coalesce(a.id2, b.id2) as id2,
C1, C2
from
(select id1, id2, count(*) as c1 from d1 group by id1, id2) as a
full join
(select id1, id2, count(*) as c2 from d2 group by id1, id2) as b
on a.id1=b.id1 and a.id2=b.id2
order by id1, id2;
quit;
data d1; input id1 id2;
datalines;
1 11
2 12
2 12
3 11
3 14
3 14
;
data d2; input id1 id2;
datalines;
1 11
4 15
4 15
;
proc sql;
select coalesce(a.id1,b.id1) as id1,coalesce(a.id2,b.id2) as id2,c1,c2
from (select id1,id2,count(*) as c1 from d1 group by id1,id2) as a
full join
(select id1,id2,count(*) as c2 from d2 group by id1,id2) as b
on a.id1=b.id1 and a.id2=b.id2;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.