I am trying to merge these two data sets a and b. I want this to be like this (missing id1 replaced by nonmissing id2 and keep their corresponding values as one observation and rest matching observations)
want
1 0 1
2 0 1
3 0 1
4 0 1
5 0 1
data a; infile cards ; input id1 x ; cards; 1 0 2 0 3 0 . 0 5 0 ; run; data b; infile cards; input id2 y; cards; 1 1 2 1 3 1 4 1 5 1 ; run; proc sql _method; create table want as select coalesce(id1,id2) as id, a.x, b.y from a as a left join b as b on a.id1=b.id2; run;
The SQL solution proposed above can be notoriously slow for large problems. This will be faster:
proc sql;
create table d as
select *
from a inner join b on a.id1=b.id2
where a.id1 is not missing
union all
select *
from a inner join b on a.next_id1=b.next_id2
where a.id1 is missing;
quit;
How do you know the missing ID1 is matched up with ID2=4?
It looks like it's based solely on position so maybe a dataset merge?
data want2;
merge a b;
id=coalesce(id1, id2);
keep id x y;
run;
@Reeza Thanks. May be I did not put my question adequately. I want to use another matching id,let's say next_id1 if first id (id1) is missing for some observations to match only those observations. It's like conditional match merge. This is because I have some missing values for id1 and those observations can be matched using second id (next_id1). The ideas is to match every observations from first data set to second data set. I wonder if it is doable logic in one data step?
data a; input id1 next_id1 x; cards; 1 2 0 2 3 0 . 1 0 ; data a; input id2 next_id2 y; cards; 1 2 1 2 3 1 3 1 1 ;
I'm not sure your problem is clearly defined yet, but here's another stab.
proc sql;
create table want as
select id1, id2, next_id1, next_id2, x, y,
coalesce(id1, id2) as want_id
from a
join b
on a.id1=b.id2 OR a.next_id1=b.next_id2;
quit;
The SQL solution is straitforward
proc sql;
create table c as
select *
from a inner join b
on a.id1=b.id2 or a.id1 is missing and a.next_id1=b.next_id2;
quit;
The data step equivalent is left as an exercise
The SQL solution proposed above can be notoriously slow for large problems. This will be faster:
proc sql;
create table d as
select *
from a inner join b on a.id1=b.id2
where a.id1 is not missing
union all
select *
from a inner join b on a.next_id1=b.next_id2
where a.id1 is missing;
quit;
Don't waste too much time on the data step counterpart. This is a very chalenging problem (for me anyway).
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.