BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SAS_inquisitive
Lapis Lazuli | Level 10

I am trying to merge these two data sets a and b.  I want  this to be like this (missing id1 replaced by nonmissing id2 and keep their corresponding values as one observation and rest matching observations)

 

want

1 0  1

2 0  1

3 0  1

4 0  1

5 0  1

data a;
  infile cards ;
  input id1 x ;
  cards;
  1 0
  2 0
  3 0
  . 0
  5 0
  ;
run;

data b;
  infile cards;
  input id2 y;
  cards;
  1 1
  2 1
  3 1
  4 1
  5 1
  ;
run;

proc sql _method;
  create table want as
  select coalesce(id1,id2) as id, a.x, b.y
    from a as a
	   left join
	     b as b
	on a.id1=b.id2;
run;

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

The SQL solution proposed above can be notoriously slow for large problems. This will be faster:

 

proc sql;
create table d as
select * 
from a inner join b on a.id1=b.id2
where a.id1 is not missing
union all
select *
from a inner join b on a.next_id1=b.next_id2
where a.id1 is missing;
quit;
PG

View solution in original post

9 REPLIES 9
Reeza
Super User

How do you know the missing ID1 is matched up with ID2=4?

It looks like it's based solely on position so maybe a dataset merge?

 

data want2;
merge a b;
id=coalesce(id1, id2);
keep id x y;
run;
SAS_inquisitive
Lapis Lazuli | Level 10

@Reeza Thanks.  May be I did not put my question adequately.  I want to use another matching id,let's say next_id1 if first id (id1) is missing for some observations to match only those observations. It's like conditional match merge. This is because I have some missing values for id1 and those observations can be matched using second id (next_id1). The ideas is to match every observations from first data set to second data set.  I wonder if it is doable logic in one data step?

 

data a;
  input id1 next_id1 x;
cards;
1 2 0
2 3 0
. 1 0
; 

data a;
  input id2 next_id2 y;
cards;
1 2 1
2 3 1
3 1 1
; 
Reeza
Super User

I'm not sure your problem is clearly defined yet, but here's another stab. 

proc sql;
create table want as
select id1, id2, next_id1, next_id2, x, y, 
	coalesce(id1, id2) as want_id
from a 
join b
on a.id1=b.id2 OR a.next_id1=b.next_id2;
quit;
SAS_inquisitive
Lapis Lazuli | Level 10
So any observations not matched by a.id1=b.id2 will be matched by a.next_id1=b.next_id2, right?
Reeza
Super User
No, if either of those conditions are met then it will be matched. A sql update may be a better option but I'm not good at coding those.
PGStats
Opal | Level 21

The SQL solution is straitforward

 

proc sql;
create table c as
select *
from a inner join b
on a.id1=b.id2 or a.id1 is missing and a.next_id1=b.next_id2;
quit;

The data step equivalent is left as an exercise Smiley Wink

PG
SAS_inquisitive
Lapis Lazuli | Level 10

Thanks @Reeza and @PGStats.  This is great.  I'll attempt data step counterpart.

PGStats
Opal | Level 21

The SQL solution proposed above can be notoriously slow for large problems. This will be faster:

 

proc sql;
create table d as
select * 
from a inner join b on a.id1=b.id2
where a.id1 is not missing
union all
select *
from a inner join b on a.next_id1=b.next_id2
where a.id1 is missing;
quit;
PG
PGStats
Opal | Level 21

Don't waste too much time on the data step counterpart. This is a very chalenging problem (for me anyway).

PG

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2345 views
  • 3 likes
  • 3 in conversation