I am trying to merge these two data sets a and b. I want this to be like this (missing id1 replaced by nonmissing id2 and keep their corresponding values as one observation and rest matching observations)
want
1 0 1
2 0 1
3 0 1
4 0 1
5 0 1
data a; infile cards ; input id1 x ; cards; 1 0 2 0 3 0 . 0 5 0 ; run; data b; infile cards; input id2 y; cards; 1 1 2 1 3 1 4 1 5 1 ; run; proc sql _method; create table want as select coalesce(id1,id2) as id, a.x, b.y from a as a left join b as b on a.id1=b.id2; run;
The SQL solution proposed above can be notoriously slow for large problems. This will be faster:
proc sql;
create table d as
select *
from a inner join b on a.id1=b.id2
where a.id1 is not missing
union all
select *
from a inner join b on a.next_id1=b.next_id2
where a.id1 is missing;
quit;
How do you know the missing ID1 is matched up with ID2=4?
It looks like it's based solely on position so maybe a dataset merge?
data want2;
merge a b;
id=coalesce(id1, id2);
keep id x y;
run;
@Reeza Thanks. May be I did not put my question adequately. I want to use another matching id,let's say next_id1 if first id (id1) is missing for some observations to match only those observations. It's like conditional match merge. This is because I have some missing values for id1 and those observations can be matched using second id (next_id1). The ideas is to match every observations from first data set to second data set. I wonder if it is doable logic in one data step?
data a; input id1 next_id1 x; cards; 1 2 0 2 3 0 . 1 0 ; data a; input id2 next_id2 y; cards; 1 2 1 2 3 1 3 1 1 ;
I'm not sure your problem is clearly defined yet, but here's another stab.
proc sql;
create table want as
select id1, id2, next_id1, next_id2, x, y,
coalesce(id1, id2) as want_id
from a
join b
on a.id1=b.id2 OR a.next_id1=b.next_id2;
quit;
The SQL solution is straitforward
proc sql;
create table c as
select *
from a inner join b
on a.id1=b.id2 or a.id1 is missing and a.next_id1=b.next_id2;
quit;
The data step equivalent is left as an exercise
The SQL solution proposed above can be notoriously slow for large problems. This will be faster:
proc sql;
create table d as
select *
from a inner join b on a.id1=b.id2
where a.id1 is not missing
union all
select *
from a inner join b on a.next_id1=b.next_id2
where a.id1 is missing;
quit;
Don't waste too much time on the data step counterpart. This is a very chalenging problem (for me anyway).
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.