I'm trying to import a large dataset (on the order of 10TB) from a remote server. I only need a subset of the information: the last valid transaction for each firm publicly traded for each 15 minute interval. I have code written that works--slowly-- for years with fewer transactions than now (i.e., for 1996 it takes a few hours, for 1998 it takes a few days)... but when I get to current year, the program seems to have an infinite run time-- more than a few weeks, for sure.
Does anyone have any suggestions as to the best way to approach this? Should I separate it into time-intervals and do each seperately (time is indexed)? Ideally I'd like run time from 1996-2009 to be under a week.