DATA Step, Macro, Functions and more

merging issue

Accepted Solution Solved
Reply
Regular Contributor
Posts: 234
Accepted Solution

merging issue

I am trying to merge these two data sets a and b.  I want  this to be like this (missing id1 replaced by nonmissing id2 and keep their corresponding values as one observation and rest matching observations)

 

want

1 0  1

2 0  1

3 0  1

4 0  1

5 0  1

data a;
  infile cards ;
  input id1 x ;
  cards;
  1 0
  2 0
  3 0
  . 0
  5 0
  ;
run;

data b;
  infile cards;
  input id2 y;
  cards;
  1 1
  2 1
  3 1
  4 1
  5 1
  ;
run;

proc sql _method;
  create table want as
  select coalesce(id1,id2) as id, a.x, b.y
    from a as a
	   left join
	     b as b
	on a.id1=b.id2;
run;

 

 

 


Accepted Solutions
Solution
‎01-07-2016 11:56 PM
Respected Advisor
Posts: 4,641

Re: merging issue

The SQL solution proposed above can be notoriously slow for large problems. This will be faster:

 

proc sql;
create table d as
select * 
from a inner join b on a.id1=b.id2
where a.id1 is not missing
union all
select *
from a inner join b on a.next_id1=b.next_id2
where a.id1 is missing;
quit;
PG

View solution in original post


All Replies
Super User
Posts: 17,749

Re: merging issue

How do you know the missing ID1 is matched up with ID2=4?

It looks like it's based solely on position so maybe a dataset merge?

 

data want2;
merge a b;
id=coalesce(id1, id2);
keep id x y;
run;
Regular Contributor
Posts: 234

Re: merging issue

[ Edited ]

@Reeza Thanks.  May be I did not put my question adequately.  I want to use another matching id,let's say next_id1 if first id (id1) is missing for some observations to match only those observations. It's like conditional match merge. This is because I have some missing values for id1 and those observations can be matched using second id (next_id1). The ideas is to match every observations from first data set to second data set.  I wonder if it is doable logic in one data step?

 

data a;
  input id1 next_id1 x;
cards;
1 2 0
2 3 0
. 1 0
; 

data a;
  input id2 next_id2 y;
cards;
1 2 1
2 3 1
3 1 1
; 
Super User
Posts: 17,749

Re: merging issue

I'm not sure your problem is clearly defined yet, but here's another stab. 

proc sql;
create table want as
select id1, id2, next_id1, next_id2, x, y, 
	coalesce(id1, id2) as want_id
from a 
join b
on a.id1=b.id2 OR a.next_id1=b.next_id2;
quit;
Regular Contributor
Posts: 234

Re: merging issue

So any observations not matched by a.id1=b.id2 will be matched by a.next_id1=b.next_id2, right?
Super User
Posts: 17,749

Re: merging issue

No, if either of those conditions are met then it will be matched. A sql update may be a better option but I'm not good at coding those.
Respected Advisor
Posts: 4,641

Re: merging issue

The SQL solution is straitforward

 

proc sql;
create table c as
select *
from a inner join b
on a.id1=b.id2 or a.id1 is missing and a.next_id1=b.next_id2;
quit;

The data step equivalent is left as an exercise Smiley Wink

PG
Regular Contributor
Posts: 234

Re: merging issue

Thanks @Reeza and @PGStats.  This is great.  I'll attempt data step counterpart.

Solution
‎01-07-2016 11:56 PM
Respected Advisor
Posts: 4,641

Re: merging issue

The SQL solution proposed above can be notoriously slow for large problems. This will be faster:

 

proc sql;
create table d as
select * 
from a inner join b on a.id1=b.id2
where a.id1 is not missing
union all
select *
from a inner join b on a.next_id1=b.next_id2
where a.id1 is missing;
quit;
PG
Respected Advisor
Posts: 4,641

Re: merging issue

Don't waste too much time on the data step counterpart. This is a very chalenging problem (for me anyway).

PG
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 302 views
  • 3 likes
  • 3 in conversation