Solved: Re: Combine 2 Tables

mrzlatan91 · Posted 10-02-2018 08:17 PM

Hey guys,

I have the following 2 tables old and update and want to have the output like want:

data work.old;
input (Date cusip fundid mod) ($);
datalines;
Mar2012 123  A  5$
Mar2012 124  A  5$
Apr2012 124  B  6$
;

data work.update;
input (Date cusip fundid mod) ($);
datalines;
Apr2012 124  B  7$
May2012 124  B  10$
;

data work.want;
input (Date cusip fundid mod) ($);
datalines;
Mar2012 123  A  5$
Mar2012 124  A  5$
Apr2012 124  B  7$
May2012 124  B  10$
;

That means: Table want should have all (date-cusip)-observations from work.update and

only those (date-cusip)-observations from work.old that are not in work.update.

Each date-cusip-Combination can exist more than once in both tables (with different fundids) but always have the same

value for mod.

It can be said, that work.old is a bigger table than work.update, so it would be perfect if someone could provide me a hash solution

(for better performance).

Thanks in advance 🙂

Astounding · Posted 10-02-2018 09:28 PM

There's probably a hash solution for this, but here's my plain vanilla solution:

proc sort data=old;

by date cusip fundid;

run;

proc sort data=update;

by date cusip fundid;

run;

data want;

merge old update;

by date cusip fundid;

run;

It brings in all the data, but when there is a match uses the data from UPDATE to overwrite the data from OLD.

View solution in original post

ChrisNZ · Posted 10-02-2018 09:23 PM

A simple merge by should do this perfectly. Why a hash solution?

High-Performance SAS Coding - Third Edition

Astounding · Posted 10-02-2018 09:28 PM

There's probably a hash solution for this, but here's my plain vanilla solution:

proc sort data=old;

by date cusip fundid;

run;

proc sort data=update;

by date cusip fundid;

run;

data want;

merge old update;

by date cusip fundid;

run;

It brings in all the data, but when there is a match uses the data from UPDATE to overwrite the data from OLD.

mrzlatan91 · Posted 10-03-2018 08:26 AM

thanks, your solution has the best performance 🙂

Reeza · Posted 10-02-2018 09:33 PM

Would an UPDATE statement work? You need to sort the data ahead of time but it works well.

data want;
update old update;
by date cusip fundid;
run;

@mrzlatan91 wrote:

Hey guys,

I have the following 2 tables old and update and want to have the output like want:
data work.old;
input (Date cusip fundid mod) ($);
datalines;
Mar2012 123  A  5$
Mar2012 124  A  5$
Apr2012 124  B  6$
;

data work.update;
input (Date cusip fundid mod) ($);
datalines;
Apr2012 124  B  7$
May2012 124  B  10$
;

data work.want;
input (Date cusip fundid mod) ($);
datalines;
Mar2012 123  A  5$
Mar2012 124  A  5$
Apr2012 124  B  7$
May2012 124  B  10$
;
That means: Table want should have all (date-cusip)-observations from work.update and

only those (date-cusip)-observations from work.old that are not in work.update.

Each date-cusip-Combination can exist more than once in both tables (with different fundids) but always have the same

value for mod.

It can be said, that work.old is a bigger table than work.update, so it would be perfect if someone could provide me a hash solution

(for better performance).

Thanks in advance 🙂

mrzlatan91 · Posted 10-03-2018 08:30 AM

yes, this worked as well. thank you 🙂

novinosrin · Posted 10-02-2018 09:57 PM

Looks a plain append/interleave to me

s_lassen · Posted 10-03-2018 03:54 AM

If your OLD dataset is large, and you do not need to keep the previous version, updating in place may be the fast way to do it:

data work.old;
  modify old update;
  by date cusip;
  if _iorc_ then do;
    output;
	_error_=0;
	end;
  else replace;
run;

The _IORC_ is set when there is an observation in UPDATE that does not have a correspondent key in OLD.

If you need to improve the performance more, index your OLD table:

Proc sql;
  create Unique index idx on old(date,cusip);
quit;

Registration is open

SAS Training: Just a Click Away