BookmarkSubscribeRSS Feed
Solph
Pyrite | Level 9

I've two data sets I need to use SQL to stack up or join after doing the count (# of records). I then need to sum up the count from data set 1 and the count from data set 2. Since the data sets are really huge, I'm trying to do in fewer steps if possible. Here is the code: I used UNION to join  but it would only read in the count variable from data 1, not the count from data 2. If I use OUTER UNION Corr, it will treat those from data 1 and data 2 as two records, naturally. I was hoping there is a way SAS SQL can smartly consolidate the two cases. I know I can output the count-processed data for data 1, and then for data 2, and then create data 3 through FULL OUTER JOIN but I was hoping not to output the intermediate steps to cut my data running time. Is there a way?  Thanks in advance.

 

 

data d1; input id1 id2;
datalines;
1 11
2 12
2 12
3 11
3 14
3 14
;
data d2; input id1 id2;
datalines;
1 11
4 15
4 15
;
proc sql; create table d3 as 
	select distinct id1, id2, count(*) as C1 from d1 group by id1, id2
	outer union corr 
	select distinct id1, id2, count(*) as C2 from d2 group by id1, id2
	order by id1, id2;
quit;
proc print data=d1; run;
proc print data=d2; run;
proc print data=d3; run;

 

Here is what I got if I use UNION or OUTER UNION CORR.

If using UNION       If using OUTER UNION CORR    
Obs id1 id2 C1   Obs id1 id2 C1 C2
1 1 11 1   1 1 11 . 1
2 2 12 2   2 1 11 1 .
3 3 11 1   3 2 12 2 .
4 3 14 2   4 3 11 1 .
5 4 15 2   5 3 14 2 .
          6 4 15 . 2

 

 

Here is the data I want.

Obs id1 id2 C1 C2
1 1 11 1 1
3 2 12 2 .
4 3 11 1 .
5 3 14 2 .
6 4 15 . 2
3 REPLIES 3
Astounding
PROC Star

For a SQL guru, this should be easy.  Until one replies, you can just follow a slightly different path:

 

Actually create summaries of d1 and d2, such as:

 

proc sql;

create table d1counts as select id1, id2, count(*) as c1 from d1 group by id1, id2;

create table d2counts as select id1, id2, count(*) as c2 from d2 group by id1, id2;

quit;

 

It should be possible to create d1counts and d2counts as views rather than data sets.

 

Then put them together:

 

data d3;

merge d1counts d2counts;

by id1 id2;

run;

PGStats
Opal | Level 21

You want a full join:

 

proc sql; 
create table d3 as
select 
    coalesce(a.id1, b.id1) as id1, 
    coalesce(a.id2, b.id2) as id2, 
    C1, C2 
from 
    (select id1, id2, count(*) as c1 from d1 group by id1, id2) as a
	full join
	(select id1, id2, count(*) as c2 from d2 group by id1, id2) as b
        on a.id1=b.id1 and a.id2=b.id2
	order by id1, id2;
quit;
PG
Ksharp
Super User
data d1; input id1 id2;
datalines;
1 11
2 12
2 12
3 11
3 14
3 14
;
data d2; input id1 id2;
datalines;
1 11
4 15
4 15
;

proc sql;
select coalesce(a.id1,b.id1) as id1,coalesce(a.id2,b.id2) as id2,c1,c2
 from (select id1,id2,count(*) as c1 from d1 group by id1,id2) as a 
      full join
      (select id1,id2,count(*) as c2 from d2 group by id1,id2) as b 
      on a.id1=b.id1 and a.id2=b.id2;
quit;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1369 views
  • 1 like
  • 4 in conversation