Hey guys,
I have the following 2 tables old and update and want to have the output like want:
data work.old;
input (Date cusip fundid mod) ($);
datalines;
Mar2012 123 A 5$
Mar2012 124 A 5$
Apr2012 124 B 6$
;
data work.update;
input (Date cusip fundid mod) ($);
datalines;
Apr2012 124 B 7$
May2012 124 B 10$
;
data work.want;
input (Date cusip fundid mod) ($);
datalines;
Mar2012 123 A 5$
Mar2012 124 A 5$
Apr2012 124 B 7$
May2012 124 B 10$
;
That means: Table want should have all (date-cusip)-observations from work.update and
only those (date-cusip)-observations from work.old that are not in work.update.
Each date-cusip-Combination can exist more than once in both tables (with different fundids) but always have the same
value for mod.
It can be said, that work.old is a bigger table than work.update, so it would be perfect if someone could provide me a hash solution
(for better performance).
Thanks in advance 🙂
There's probably a hash solution for this, but here's my plain vanilla solution:
proc sort data=old;
by date cusip fundid;
run;
proc sort data=update;
by date cusip fundid;
run;
data want;
merge old update;
by date cusip fundid;
run;
It brings in all the data, but when there is a match uses the data from UPDATE to overwrite the data from OLD.
A simple merge by should do this perfectly. Why a hash solution?
There's probably a hash solution for this, but here's my plain vanilla solution:
proc sort data=old;
by date cusip fundid;
run;
proc sort data=update;
by date cusip fundid;
run;
data want;
merge old update;
by date cusip fundid;
run;
It brings in all the data, but when there is a match uses the data from UPDATE to overwrite the data from OLD.
thanks, your solution has the best performance 🙂
Would an UPDATE statement work? You need to sort the data ahead of time but it works well.
data want;
update old update;
by date cusip fundid;
run;
@mrzlatan91 wrote:
Hey guys,
I have the following 2 tables old and update and want to have the output like want:
data work.old; input (Date cusip fundid mod) ($); datalines; Mar2012 123 A 5$ Mar2012 124 A 5$ Apr2012 124 B 6$ ; data work.update; input (Date cusip fundid mod) ($); datalines; Apr2012 124 B 7$ May2012 124 B 10$ ; data work.want; input (Date cusip fundid mod) ($); datalines; Mar2012 123 A 5$ Mar2012 124 A 5$ Apr2012 124 B 7$ May2012 124 B 10$ ;
That means: Table want should have all (date-cusip)-observations from work.update and
only those (date-cusip)-observations from work.old that are not in work.update.
Each date-cusip-Combination can exist more than once in both tables (with different fundids) but always have the same
value for mod.
It can be said, that work.old is a bigger table than work.update, so it would be perfect if someone could provide me a hash solution
(for better performance).
Thanks in advance 🙂
yes, this worked as well. thank you 🙂
If your OLD dataset is large, and you do not need to keep the previous version, updating in place may be the fast way to do it:
data work.old;
modify old update;
by date cusip;
if _iorc_ then do;
output;
_error_=0;
end;
else replace;
run;
The _IORC_ is set when there is an observation in UPDATE that does not have a correspondent key in OLD.
If you need to improve the performance more, index your OLD table:
Proc sql;
create Unique index idx on old(date,cusip);
quit;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.