I've two data sets I need to use SQL to stack up or join after doing the count (# of records). I then need to sum up the count from data set 1 and the count from data set 2. Since the data sets are really huge, I'm trying to do in fewer steps if possible. Here is the code: I used UNION to join but it would only read in the count variable from data 1, not the count from data 2. If I use OUTER UNION Corr, it will treat those from data 1 and data 2 as two records, naturally. I was hoping there is a way SAS SQL can smartly consolidate the two cases. I know I can output the count-processed data for data 1, and then for data 2, and then create data 3 through FULL OUTER JOIN but I was hoping not to output the intermediate steps to cut my data running time. Is there a way? Thanks in advance.
data d1; input id1 id2;
datalines;
1 11
2 12
2 12
3 11
3 14
3 14
;
data d2; input id1 id2;
datalines;
1 11
4 15
4 15
;
proc sql; create table d3 as
select distinct id1, id2, count(*) as C1 from d1 group by id1, id2
outer union corr
select distinct id1, id2, count(*) as C2 from d2 group by id1, id2
order by id1, id2;
quit;
proc print data=d1; run;
proc print data=d2; run;
proc print data=d3; run;
Here is what I got if I use UNION or OUTER UNION CORR.
If using UNION | If using OUTER UNION CORR | ||||||||
Obs | id1 | id2 | C1 | Obs | id1 | id2 | C1 | C2 | |
1 | 1 | 11 | 1 | 1 | 1 | 11 | . | 1 | |
2 | 2 | 12 | 2 | 2 | 1 | 11 | 1 | . | |
3 | 3 | 11 | 1 | 3 | 2 | 12 | 2 | . | |
4 | 3 | 14 | 2 | 4 | 3 | 11 | 1 | . | |
5 | 4 | 15 | 2 | 5 | 3 | 14 | 2 | . | |
6 | 4 | 15 | . | 2 |
Here is the data I want.
Obs | id1 | id2 | C1 | C2 |
1 | 1 | 11 | 1 | 1 |
3 | 2 | 12 | 2 | . |
4 | 3 | 11 | 1 | . |
5 | 3 | 14 | 2 | . |
6 | 4 | 15 | . | 2 |
For a SQL guru, this should be easy. Until one replies, you can just follow a slightly different path:
Actually create summaries of d1 and d2, such as:
proc sql;
create table d1counts as select id1, id2, count(*) as c1 from d1 group by id1, id2;
create table d2counts as select id1, id2, count(*) as c2 from d2 group by id1, id2;
quit;
It should be possible to create d1counts and d2counts as views rather than data sets.
Then put them together:
data d3;
merge d1counts d2counts;
by id1 id2;
run;
You want a full join:
proc sql;
create table d3 as
select
coalesce(a.id1, b.id1) as id1,
coalesce(a.id2, b.id2) as id2,
C1, C2
from
(select id1, id2, count(*) as c1 from d1 group by id1, id2) as a
full join
(select id1, id2, count(*) as c2 from d2 group by id1, id2) as b
on a.id1=b.id1 and a.id2=b.id2
order by id1, id2;
quit;
data d1; input id1 id2;
datalines;
1 11
2 12
2 12
3 11
3 14
3 14
;
data d2; input id1 id2;
datalines;
1 11
4 15
4 15
;
proc sql;
select coalesce(a.id1,b.id1) as id1,coalesce(a.id2,b.id2) as id2,c1,c2
from (select id1,id2,count(*) as c1 from d1 group by id1,id2) as a
full join
(select id1,id2,count(*) as c2 from d2 group by id1,id2) as b
on a.id1=b.id1 and a.id2=b.id2;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.