BookmarkSubscribeRSS Feed
pacman94
Calcite | Level 5

Hello . I need help with combining two datasets with all time points for each ID even though they are not in dataset #2:

 

dataset #1: 'event' table

event
Month 1
Month 2
Month 3

dataset#2 'id_all'

IDevent valuevalue2
5656Month 113324
6565Month 231465
4663Month 114697
4242   
1234Month 314656
5421Month 112647

 

want:

IDevent valuevalue2
5656Month 113324
5656Month 2  
5656Month 3  
6565Month 1  
6565Month 231465
6565Month 3  
4663Month 114697
4663Month 2  
4663Month 3  
4242   
4242Month 1  
4242Month 2  
4242Month 3  
5421Month 112647
5421Month 2  
5421Month 3  
1234Month 1  
1234Month 2  
1234Month 314656

 

here's my attempt so far:

proc sql;

select  a.*, b.event

from id_all a right join event b on a.event = b.event;

quit;

 

doesn't work:(

6 REPLIES 6
Reeza
Super User
Change your code from a right join, which only keeps records in the right table, to a full join, where you keep records from both tables.
maguiremq
SAS Super FREQ

In your 'want' table, do you need that null value for event for id 4242?

pacman94
Calcite | Level 5
Yes
PGStats
Opal | Level 21

If every event is represented at least once in dataset2, you don't even need dataset1:

 

proc sql;
select  	
	a.ID, 
	b.event,
	c.value,
	c.value2
from	
	(select distinct ID from dataset2) as a cross join
	(select distinct event from dataset2) as b left join
	dataset2 as c on a.ID=c.ID and b.event=c.event;
quit;

Edit:  Or 

proc sql;
select      
    a.ID, 
    b.event,
    c.value,
    c.value2
from    
    (select distinct ID from dataset2) as a cross join
    (select distinct event from dataset2 where event is not missing) as b left join
    dataset2 as c on a.ID=c.ID and b.event=c.event
union select * from dataset2 where event is missing;
quit;

to accomodate missing events.

PG
maguiremq
SAS Super FREQ

When I ran @PGStats code (pardon me if I mistyped something), I encountered the same issue that I had before I asked you about the null value for event. It gets replicated across each ID in the output.

 

id event value value2 
1234   . . 
1234 Month 1 . . 
1234 Month 2 . . 
1234 Month 3 14 656 
4242   . . 
4242 Month 1 . . 
4242 Month 2 . . 
4242 Month 3 . . 
4663   . . 
4663 Month 1 14 697 
4663 Month 2 . . 
4663 Month 3 . . 
5421   . . 
5421 Month 1 12 647 
5421 Month 2 . . 
5421 Month 3 . . 
5656   . . 
5656 Month 1 13 324 
5656 Month 2 . . 
5656 Month 3 . . 
6565   . . 
6565 Month 1 . . 
6565 Month 2 31 465 
6565 Month 3 . . 

 

It's a bit odd that you have to retain that null event, but if that is what you need, here is one solution. If you need more info about the SPARSE option in PROC FREQ, look here.

 

I'm not the best with PROC SQL, so @PGStats (thanks for showing me how to do something like this with PROC SQL) may have a better solution after accounting for that weird requirement with your data.

 

data have;
infile datalines delimiter = "," missover;
input id $ event :$ value  value2;
datalines;
5656,Month 1,13,324
6565,Month 2,31,465
4663,Month 1,14,697
4242,,.,.
1234,Month 3,14,656
5421,Month 1,12,647
;
run;

proc freq data = have;
	tables event*id / 	out = have_2 (where = (event ^= "" or (event = "" and count = 1)))	  
						sparse;
run;

proc sql;
	select
				a.id,
				a.event,
				b.value,
				b.value2
	from
				have_2 as a
					left join
				have  as b
						on	a.id 	= b.id 		and
							a.event = b.event;
quit;
id event value value2 
1234 Month 1 . . 
1234 Month 2 . . 
1234 Month 3 14 656 
4242   . . 
4242 Month 1 . . 
4242 Month 2 . . 
4242 Month 3 . . 
4663 Month 1 14 697 
4663 Month 2 . . 
4663 Month 3 . . 
5421 Month 1 12 647 
5421 Month 2 . . 
5421 Month 3 . . 
5656 Month 1 13 324 
5656 Month 2 . . 
5656 Month 3 . . 
6565 Month 1 . . 
6565 Month 2 31 465 
6565 Month 3 . . 
Rydhm
Obsidian | Level 7

You can get the final table by following proc sql few steps. 

1. Get the cartesian Product of just Id and event.

2. Merge the step1 result with the original id_all table and get the common rows.

3. Get the subset from step1 which is excluded from step2 and add it back

data event;
input event $10.;
datalines;
Month1
Month2
Month3
run;

data id_all;
infile datalines missover;
input ID event :$10. value value2;
datalines;
5656 Month1 13 324
6565 Month2 31 465
4663 Month1 14 697
4242
1234 Month3 14 656
5421 Month1 12 647
run;

proc sql;
create table subset
as
select  a.id, b.event
from id_all a,event b;
quit;

proc sql;
create table match
as
select a.id, b.event, a.value, a.value2
from id_all a
left join subset b
on a.id=b.id
and a.event=b.event
;
quit;

proc sql;
create table all
as
select * from match
union
(select id, event, . as value, . as value2 from subset
 except
 select id, event, . as value, . as value2 from match);
quit;

proc sort data=all; by id, event; run;

proc print data=all; run; 

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 563 views
  • 0 likes
  • 5 in conversation