Hey guys,
I have the following 2 tables old and update and want to have the output like want:
data work.old;
input (Date cusip fundid mod) ($);
datalines;
Mar2012 123 A 5$
Mar2012 124 A 5$
Apr2012 124 B 6$
;
data work.update;
input (Date cusip fundid mod) ($);
datalines;
Apr2012 124 B 7$
May2012 124 B 10$
;
data work.want;
input (Date cusip fundid mod) ($);
datalines;
Mar2012 123 A 5$
Mar2012 124 A 5$
Apr2012 124 B 7$
May2012 124 B 10$
;
That means: Table want should have all (date-cusip)-observations from work.update and
only those (date-cusip)-observations from work.old that are not in work.update.
Each date-cusip-Combination can exist more than once in both tables (with different fundids) but always have the same
value for mod.
It can be said, that work.old is a bigger table than work.update, so it would be perfect if someone could provide me a hash solution
(for better performance).
Thanks in advance 🙂
There's probably a hash solution for this, but here's my plain vanilla solution:
proc sort data=old;
by date cusip fundid;
run;
proc sort data=update;
by date cusip fundid;
run;
data want;
merge old update;
by date cusip fundid;
run;
It brings in all the data, but when there is a match uses the data from UPDATE to overwrite the data from OLD.
A simple merge by should do this perfectly. Why a hash solution?
There's probably a hash solution for this, but here's my plain vanilla solution:
proc sort data=old;
by date cusip fundid;
run;
proc sort data=update;
by date cusip fundid;
run;
data want;
merge old update;
by date cusip fundid;
run;
It brings in all the data, but when there is a match uses the data from UPDATE to overwrite the data from OLD.
thanks, your solution has the best performance 🙂
Would an UPDATE statement work? You need to sort the data ahead of time but it works well.
data want;
update old update;
by date cusip fundid;
run;
@mrzlatan91 wrote:
Hey guys,
I have the following 2 tables old and update and want to have the output like want:
data work.old; input (Date cusip fundid mod) ($); datalines; Mar2012 123 A 5$ Mar2012 124 A 5$ Apr2012 124 B 6$ ; data work.update; input (Date cusip fundid mod) ($); datalines; Apr2012 124 B 7$ May2012 124 B 10$ ; data work.want; input (Date cusip fundid mod) ($); datalines; Mar2012 123 A 5$ Mar2012 124 A 5$ Apr2012 124 B 7$ May2012 124 B 10$ ;
That means: Table want should have all (date-cusip)-observations from work.update and
only those (date-cusip)-observations from work.old that are not in work.update.
Each date-cusip-Combination can exist more than once in both tables (with different fundids) but always have the same
value for mod.
It can be said, that work.old is a bigger table than work.update, so it would be perfect if someone could provide me a hash solution
(for better performance).
Thanks in advance 🙂
yes, this worked as well. thank you 🙂
If your OLD dataset is large, and you do not need to keep the previous version, updating in place may be the fast way to do it:
data work.old;
modify old update;
by date cusip;
if _iorc_ then do;
output;
_error_=0;
end;
else replace;
run;
The _IORC_ is set when there is an observation in UPDATE that does not have a correspondent key in OLD.
If you need to improve the performance more, index your OLD table:
Proc sql;
create Unique index idx on old(date,cusip);
quit;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.