BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
N8
Obsidian | Level 7 N8
Obsidian | Level 7

Hi all,

I have a many-to-many join issue arising using sql trying to join two tables that have a different number of records and both have repeating subjects. My need: combine all three tables without multiplying them.

 

Scenario:

Table 1 is demographics table and this is a unique list without repeating subjects. Table 2 is an event table that is allowed to have repeating events and they are organized by calendar date. Table 3 is another event table and it is also organized by calendar date. In trying to do left inner joins what happens is that if I have 10 events in Table 2 and 20 events in Table 3, my sql script is generating 20 total events. I believe this is an issue with not using the correct placement of 'select distinct', but I haven't been able my script into the right format. Below is some sample tables that I constructed as well as the desired output - any help would be appreciated. thank you and stay healthy. 

 

Sample script:

 

data Have1;
input ID $ age1 $ ;
datalines ;
a 20
b 28
c 23
d 31
;

data Have2;
input ID $ Date1 mmddyy10. Type1 $ ;
format date1 mmddyy8.;
datalines;
A 08/01/2009 OP
A 08/09/2009 OP
A 08/27/2009 OP
A 08/31/2009 OP
B 09/15/2009 OP
B 09/16/2009 OP
C 10/02/2009 OP
D 10/15/2009 OP
D 10/24/2009 OP
D 10/31/2009 OP
;

data Have3 ;
input ID $ Date2 mmddyy10. type2 $;
format date2 mmddyy8.;
datalines ;
A 08/05/2009 DH
B 09/03/2009 DH
B 09/16/2009 DH
B 09/24/2009 DH
C 10/24/2009 DH
C 10/31/2009 DH
D 11/02/2009 DH
;

 

Want (sample below): 

A Table that has 16 total events (there were 10 type 1 events and 7 type 2 events, but two events occurred on same day so would be listed together). 

 

IDAgeDate1Type1Date2Type2
A208/1/09OP..
A20..8/5/09DH
A208/9/09OP..
A208/27/09OP..
A208/31/09OP..
B28..9/3/09.
B289/15/09OP..
B289/16/09OP9/16/09DH
B28..9/24/09DH
C2310/2/09OP..
C23..10/24/09DH
C23..10/31/09DH
D3110/15/09OP..
D3110/24/09OP..
D3110/31/09OP..
D31..11/2/09DH
1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Your example data has some issues.

First the case of your ID variable is different in the first set, so it will not match the others unless you make the case match.

 

The second is since you want to match on date then don't create two different date variables. A date plus event tells you which occurred. So just use one date and then you can get two (or more) event types on the same days easily.

 

data Have1;
input ID $ age1 $ ;
datalines ;
A 20
B 28
C 23
D 31
;

data Have2;
input ID $ Date mmddyy10. Type1 $ ;
format date mmddyy8.;
datalines;
A 08/01/2009 OP
A 08/09/2009 OP
A 08/27/2009 OP
A 08/31/2009 OP
B 09/15/2009 OP
B 09/16/2009 OP
C 10/02/2009 OP
D 10/15/2009 OP
D 10/24/2009 OP
D 10/31/2009 OP
;

data Have3 ;
input ID $ Date mmddyy10. type2 $;
format date mmddyy8.;
datalines ;
A 08/05/2009 DH
B 09/03/2009 DH
B 09/16/2009 DH
B 09/24/2009 DH
C 10/24/2009 DH
C 10/31/2009 DH
D 11/02/2009 DH
;

proc sort data=have1;
    by id ;
run;
proc sort data=have2;
   by id date;
run;
proc sort data=have3;
	by id date;
run;
data temp;
   merge have2 have3;
	by id date;
run;

data want;
   merge have1 temp;
	by id;
run;

I used a merge because for the problem example , with two different matching criteria Id plus Id and Date, that made more sense. I'm sure someone can force proc sql to do this but I'm too lazy to write nested from clauses if I don't see any advantage.

 

You really should show the code you attempted. Some times you might be close and only need a set of parentheses or a different ON (needed in this case for sure) condition.

View solution in original post

3 REPLIES 3
ballardw
Super User

Your example data has some issues.

First the case of your ID variable is different in the first set, so it will not match the others unless you make the case match.

 

The second is since you want to match on date then don't create two different date variables. A date plus event tells you which occurred. So just use one date and then you can get two (or more) event types on the same days easily.

 

data Have1;
input ID $ age1 $ ;
datalines ;
A 20
B 28
C 23
D 31
;

data Have2;
input ID $ Date mmddyy10. Type1 $ ;
format date mmddyy8.;
datalines;
A 08/01/2009 OP
A 08/09/2009 OP
A 08/27/2009 OP
A 08/31/2009 OP
B 09/15/2009 OP
B 09/16/2009 OP
C 10/02/2009 OP
D 10/15/2009 OP
D 10/24/2009 OP
D 10/31/2009 OP
;

data Have3 ;
input ID $ Date mmddyy10. type2 $;
format date mmddyy8.;
datalines ;
A 08/05/2009 DH
B 09/03/2009 DH
B 09/16/2009 DH
B 09/24/2009 DH
C 10/24/2009 DH
C 10/31/2009 DH
D 11/02/2009 DH
;

proc sort data=have1;
    by id ;
run;
proc sort data=have2;
   by id date;
run;
proc sort data=have3;
	by id date;
run;
data temp;
   merge have2 have3;
	by id date;
run;

data want;
   merge have1 temp;
	by id;
run;

I used a merge because for the problem example , with two different matching criteria Id plus Id and Date, that made more sense. I'm sure someone can force proc sql to do this but I'm too lazy to write nested from clauses if I don't see any advantage.

 

You really should show the code you attempted. Some times you might be close and only need a set of parentheses or a different ON (needed in this case for sure) condition.

JerryV
SAS Employee

Is this what was intended?  Does somebody want to simplify this?  I still like @ballardw 's better.

proc sql noprint;
create table want1 as 
select coalesce(a.id, b.id) as id
	     , a.age1
	     , b.date1, b.type1 
	from have1 as a
	full join 
	     have2 as b
	on upcase(a.id) = upcase(b.id)
;
create table want2 as 
select coalesce(a.id, c.id) as id
     , a.age1
     , c.date2, c.type2
from have1 as a
full join 
     have3 as c
on upcase(a.id) = upcase(c.id)
; 
create table WANT as 
select coalesce(w1.id, w2.id) as id
     , coalesce(w1.age1, w2.age1) as age1
     , w1.date1, w1.type1
	 , w2.date2, w2.type2
from want1 as w1
full join 
     want2 as w2
on upcase(w1.id) = upcase(w2.id) 
   and w1.age1 = w2.age1
   and w1.date1 = w2.date2
; quit;
PGStats
Opal | Level 21

With SQL I would do:

 

proc sql;
create table want as
select 
	coalesce(a.id, b.id) as id,
	c.age1,
	a.date1,
	a.type1,
	b.date2,
	b.type2
from 
	have2 as a full join
	have3 as b on a.id=b.id and a.date1=b.date2 left join
	have1 as c on coalesce(a.id, b.id) = upcase(c.id);
quit;
PG

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1045 views
  • 3 likes
  • 4 in conversation