I have 2 datasets
Dataset1
----
Datestamp Amt reference userid
2011-01-11.12:32:00:655 55000.36 09XYZABCD0885 tom
2011-01-11.12:32:44:200 55000.36 09XYZABCD0885 joe
2011-01-11.12:32:20:320 55000.36 09XYZABCD0885 dav
2011-01-12.10:28:33:228 200.00 09XYZABCD0886 ken
Dataset2
-----
Datestamp Amt reference
2011-01-11.12:32:00:655 55000.36 09XYZABCD0885
2011-01-11.12:32:44:001 55000.36 09XYZABCD0885
2011-01-11.12:32:21:001 55000.36 09XYZABCD0885
2011-01-13.10:28:33:228 201.00 09XYZABCD0886
I have to update dataset 2 with the variable userid. The key to map both datasets are datestamp,amt and reference. The tricky part is, not necessarily there would be a exact match for milli seconds and seconds. However, I have to first find if every thing matches if found update that record of dataset2 with user id. else check if there is atleast a match till seconds if found update that record of dataset2 with user id. else check if there is atleast a match till minutes if found update that record of dataset2 with user id. If a record match is not found then write it to error dataset.
one might question why can't we just compare up to minutes level since we are updating even if the minute matches but the requirement is that they want to be absolutly certain atleast for most of the records.
Can any one let me know a efficient way of acheiving this. dataset1 contains atleast 2 million records and dataset 2 contains atleast 5 million records.
... View more