I am missing something in your code: you should either sort the LongFile_t or you should build an index while creating it. I replicated your program in a somewhat simplified version: The monthly files are created by following step: data one; * repeat with data sets two - twelve;
length ID $1;
do n=1 to 2000000;
x = 100*uniform(0);
a = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
ID = substr(a,int(uniform(0)*26+1),1);
output;
end;
run; Then I glue them together either by sorting (note that I use a data step VIEW to glue the files virtually together. That saves writing out the long file.): data long/view=long;
set one two three four five six seven eight nine ten eleven twelve;
run;
proc sort data=long out=long2;
by ID;
run; or the way you did it, but creating an index on ID on the fly: data long(index=(ID));
set one two three four five six seven eight nine ten eleven twelve;
run; Via both paths you can do the final merge. I selected 4 ID's to get from the big file as my target population (selecting some 3.7 million obs). These were my running times for the various steps: Create the long file with the index: real time 5.80 seconds, cpu time 3.57 seconds Merge based on the indexed file: real time 10.04 seconds, cpu time 8.87 seconds Create the long file via data step VIEW: real time 0.01 seconds, cpu time 0.00 seconds Sort the VIEW: real time 5.95 seconds, cpu time 2.15 seconds Merge based on the sorted long file: real time 1.33 seconds, cpu time 0.34 seconds The alternative approach would be to sort the separate PCS files and filter your group at that level and glue the resulting files together. Sorting one file of 2,000,000 observations took 0.33 seconds RT and 0.11 seconds CPU, creating the selection took 0.13 seconds RT and 0.10 seconds CPU. Multiplied by 12 and add a step to glue everything together you are possibly still better of. I hope this helps.
... View more