Hello . I need help with combining two datasets with all time points for each ID even though they are not in dataset #2:
dataset #1: 'event' table
event |
Month 1 |
Month 2 |
Month 3 |
dataset#2 'id_all'
ID | event | value | value2 |
5656 | Month 1 | 13 | 324 |
6565 | Month 2 | 31 | 465 |
4663 | Month 1 | 14 | 697 |
4242 | |||
1234 | Month 3 | 14 | 656 |
5421 | Month 1 | 12 | 647 |
want:
ID | event | value | value2 |
5656 | Month 1 | 13 | 324 |
5656 | Month 2 | ||
5656 | Month 3 | ||
6565 | Month 1 | ||
6565 | Month 2 | 31 | 465 |
6565 | Month 3 | ||
4663 | Month 1 | 14 | 697 |
4663 | Month 2 | ||
4663 | Month 3 | ||
4242 | |||
4242 | Month 1 | ||
4242 | Month 2 | ||
4242 | Month 3 | ||
5421 | Month 1 | 12 | 647 |
5421 | Month 2 | ||
5421 | Month 3 | ||
1234 | Month 1 | ||
1234 | Month 2 | ||
1234 | Month 3 | 14 | 656 |
here's my attempt so far:
proc sql;
select a.*, b.event
from id_all a right join event b on a.event = b.event;
quit;
doesn't work:(
In your 'want' table, do you need that null value for event for id 4242?
If every event is represented at least once in dataset2, you don't even need dataset1:
proc sql;
select
a.ID,
b.event,
c.value,
c.value2
from
(select distinct ID from dataset2) as a cross join
(select distinct event from dataset2) as b left join
dataset2 as c on a.ID=c.ID and b.event=c.event;
quit;
Edit: Or
proc sql;
select
a.ID,
b.event,
c.value,
c.value2
from
(select distinct ID from dataset2) as a cross join
(select distinct event from dataset2 where event is not missing) as b left join
dataset2 as c on a.ID=c.ID and b.event=c.event
union select * from dataset2 where event is missing;
quit;
to accomodate missing events.
When I ran @PGStats code (pardon me if I mistyped something), I encountered the same issue that I had before I asked you about the null value for event. It gets replicated across each ID in the output.
id event value value2 1234 . . 1234 Month 1 . . 1234 Month 2 . . 1234 Month 3 14 656 4242 . . 4242 Month 1 . . 4242 Month 2 . . 4242 Month 3 . . 4663 . . 4663 Month 1 14 697 4663 Month 2 . . 4663 Month 3 . . 5421 . . 5421 Month 1 12 647 5421 Month 2 . . 5421 Month 3 . . 5656 . . 5656 Month 1 13 324 5656 Month 2 . . 5656 Month 3 . . 6565 . . 6565 Month 1 . . 6565 Month 2 31 465 6565 Month 3 . .
It's a bit odd that you have to retain that null event, but if that is what you need, here is one solution. If you need more info about the SPARSE option in PROC FREQ, look here.
I'm not the best with PROC SQL, so @PGStats (thanks for showing me how to do something like this with PROC SQL) may have a better solution after accounting for that weird requirement with your data.
data have;
infile datalines delimiter = "," missover;
input id $ event :$ value value2;
datalines;
5656,Month 1,13,324
6565,Month 2,31,465
4663,Month 1,14,697
4242,,.,.
1234,Month 3,14,656
5421,Month 1,12,647
;
run;
proc freq data = have;
tables event*id / out = have_2 (where = (event ^= "" or (event = "" and count = 1)))
sparse;
run;
proc sql;
select
a.id,
a.event,
b.value,
b.value2
from
have_2 as a
left join
have as b
on a.id = b.id and
a.event = b.event;
quit;
id event value value2 1234 Month 1 . . 1234 Month 2 . . 1234 Month 3 14 656 4242 . . 4242 Month 1 . . 4242 Month 2 . . 4242 Month 3 . . 4663 Month 1 14 697 4663 Month 2 . . 4663 Month 3 . . 5421 Month 1 12 647 5421 Month 2 . . 5421 Month 3 . . 5656 Month 1 13 324 5656 Month 2 . . 5656 Month 3 . . 6565 Month 1 . . 6565 Month 2 31 465 6565 Month 3 . .
You can get the final table by following proc sql few steps.
1. Get the cartesian Product of just Id and event.
2. Merge the step1 result with the original id_all table and get the common rows.
3. Get the subset from step1 which is excluded from step2 and add it back
data event;
input event $10.;
datalines;
Month1
Month2
Month3
run;
data id_all;
infile datalines missover;
input ID event :$10. value value2;
datalines;
5656 Month1 13 324
6565 Month2 31 465
4663 Month1 14 697
4242
1234 Month3 14 656
5421 Month1 12 647
run;
proc sql;
create table subset
as
select a.id, b.event
from id_all a,event b;
quit;
proc sql;
create table match
as
select a.id, b.event, a.value, a.value2
from id_all a
left join subset b
on a.id=b.id
and a.event=b.event
;
quit;
proc sql;
create table all
as
select * from match
union
(select id, event, . as value, . as value2 from subset
except
select id, event, . as value, . as value2 from match);
quit;
proc sort data=all; by id, event; run;
proc print data=all; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.