Solved: Re: merging issue

SAS_inquisitive · Posted 01-07-2016 06:11 PM

I am trying to merge these two data sets a and b. I want this to be like this (missing id1 replaced by nonmissing id2 and keep their corresponding values as one observation and rest matching observations)

want

1 0 1

2 0 1

3 0 1

4 0 1

5 0 1

data a;
  infile cards ;
  input id1 x ;
  cards;
  1 0
  2 0
  3 0
  . 0
  5 0
  ;
run;

data b;
  infile cards;
  input id2 y;
  cards;
  1 1
  2 1
  3 1
  4 1
  5 1
  ;
run;

proc sql _method;
  create table want as
  select coalesce(id1,id2) as id, a.x, b.y
    from a as a
	   left join
	     b as b
	on a.id1=b.id2;
run;

PGStats · Posted 01-07-2016 11:51 PM

The SQL solution proposed above can be notoriously slow for large problems. This will be faster:

proc sql;
create table d as
select * 
from a inner join b on a.id1=b.id2
where a.id1 is not missing
union all
select *
from a inner join b on a.next_id1=b.next_id2
where a.id1 is missing;
quit;

PG

View solution in original post

Reeza · Posted 01-07-2016 06:36 PM

How do you know the missing ID1 is matched up with ID2=4?

It looks like it's based solely on position so maybe a dataset merge?

data want2;
merge a b;
id=coalesce(id1, id2);
keep id x y;
run;

SAS_inquisitive · Posted 01-07-2016 08:05 PM

@Reeza Thanks. May be I did not put my question adequately. I want to use another matching id,let's say next_id1 if first id (id1) is missing for some observations to match only those observations. It's like conditional match merge. This is because I have some missing values for id1 and those observations can be matched using second id (next_id1). The ideas is to match every observations from first data set to second data set. I wonder if it is doable logic in one data step?

data a;
  input id1 next_id1 x;
cards;
1 2 0
2 3 0
. 1 0
; 

data a;
  input id2 next_id2 y;
cards;
1 2 1
2 3 1
3 1 1
;

Reeza · Posted 01-07-2016 10:38 PM

I'm not sure your problem is clearly defined yet, but here's another stab.

proc sql;
create table want as
select id1, id2, next_id1, next_id2, x, y, 
	coalesce(id1, id2) as want_id
from a 
join b
on a.id1=b.id2 OR a.next_id1=b.next_id2;
quit;

SAS_inquisitive · Posted 01-07-2016 11:18 PM

So any observations not matched by a.id1=b.id2 will be matched by a.next_id1=b.next_id2, right?

Reeza · Posted 01-07-2016 11:25 PM

No, if either of those conditions are met then it will be matched. A sql update may be a better option but I'm not good at coding those.

PGStats · Posted 01-07-2016 11:38 PM

The SQL solution is straitforward

proc sql;
create table c as
select *
from a inner join b
on a.id1=b.id2 or a.id1 is missing and a.next_id1=b.next_id2;
quit;

The data step equivalent is left as an exercise

PG

SAS_inquisitive · Posted 01-07-2016 11:47 PM

Thanks @Reeza and @PGStats. This is great. I'll attempt data step counterpart.

PGStats · Posted 01-07-2016 11:51 PM

The SQL solution proposed above can be notoriously slow for large problems. This will be faster:

proc sql;
create table d as
select * 
from a inner join b on a.id1=b.id2
where a.id1 is not missing
union all
select *
from a inner join b on a.next_id1=b.next_id2
where a.id1 is missing;
quit;

PG

PGStats · Posted 01-07-2016 11:54 PM

Don't waste too much time on the data step counterpart. This is a very chalenging problem (for me anyway).

PG

Classroom Training Available!