Hello all -
I have a dataset with ~1,000,000 rows. As an example, here's what the data looks like (data is fake, but will describe the issue I'm facing).
data TABLE_HAVE;
infile datalines truncover dlm='|' dsd;
format
driver $10.
lap 1.
time 10.2
;
input
driver: $10.
lap: 1.
time: 10.2
;
datalines;
Driver A|1|44.93
Driver A|2|46.23
Driver A|3|45.06
Driver A|4|43.46
Driver B|1|44.05
Driver B|2|44.17
Driver B|3|46.42
Driver B|4|43.92
Driver C|1|46.19
Driver C|2|46.94
Driver C|3|46.31
Driver C|4|46.76
Driver D|1|42.17
Driver D|2|45.89
Driver D|3|46.37
Driver D|4|42.87
run;
data TABLE_WANT;
infile datalines truncover dlm='|' dsd;
format
driver $10.
lap 1.
time 10.2
;
input
driver: $10.
lap: 1.
time: 10.2
;
datalines;
Driver D|1|42.17
Driver B|1|44.05
Driver B|2|44.17
Driver A|1|44.93
Driver D|2|45.89
Driver C|1|46.19
Driver A|2|46.23
Driver A|3|45.06
Driver A|4|43.46
Driver D|3|46.37
Driver D|4|42.87
Driver B|3|46.42
Driver B|4|43.92
Driver C|2|46.94
Driver C|3|46.31
Driver C|4|46.76
run;
What I have is three drivers and their four lap times. What I need my final product to be is a sorted version of the best lap times. However, the important condition is that the best time must also keep lap order in mind. For instance, the first record will be the fastest lap 1 time. However, after that, future records will be the fastest lap time for the earliest lap for each driver.
Right now I have a macro that basically sorts it by lap, picks the fastest time, and then subtracts 1 from each lap for the time/driver chosen. However, for 1 million rows, this is very tedious and takes a lot of time. Is there a faster way to come up with this final sorted table without having to spend a lot of run time looping?
Any help would be appreciated. I've spent quite a bit of time thinking about this, and just cannot think of a way to do it offhand.
Thanks!
Derek