BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
derekcra
Fluorite | Level 6

Hello all -

 

I have a dataset with ~1,000,000 rows. As an example, here's what the data looks like (data is fake, but will describe the issue I'm facing).

 

data TABLE_HAVE;
	infile datalines truncover dlm='|' dsd;
	format
		driver				$10.
		lap				1.
		time				10.2
		;
	input
		driver:				$10.
		lap:				1.
		time:				10.2
		;
datalines;
Driver A|1|44.93
Driver A|2|46.23
Driver A|3|45.06
Driver A|4|43.46
Driver B|1|44.05
Driver B|2|44.17
Driver B|3|46.42
Driver B|4|43.92
Driver C|1|46.19
Driver C|2|46.94
Driver C|3|46.31
Driver C|4|46.76
Driver D|1|42.17
Driver D|2|45.89
Driver D|3|46.37
Driver D|4|42.87
run;

data TABLE_WANT;
	infile datalines truncover dlm='|' dsd;
	format
		driver				$10.
		lap				1.
		time				10.2
		;
	input
		driver:				$10.
		lap:				1.
		time:				10.2
		;
datalines;
Driver D|1|42.17
Driver B|1|44.05
Driver B|2|44.17
Driver A|1|44.93
Driver D|2|45.89
Driver C|1|46.19
Driver A|2|46.23
Driver A|3|45.06
Driver A|4|43.46
Driver D|3|46.37
Driver D|4|42.87
Driver B|3|46.42
Driver B|4|43.92
Driver C|2|46.94
Driver C|3|46.31
Driver C|4|46.76
run;

 

What I have is three drivers and their four lap times. What I need my final product to be is a sorted version of the best lap times. However, the important condition is that the best time must also keep lap order in mind. For instance, the first record will be the fastest lap 1 time. However, after that, future records will be the fastest lap time for the earliest lap for each driver.

 

Right now I have a macro that basically sorts it by lap, picks the fastest time, and then subtracts 1 from each lap for the time/driver chosen. However, for 1 million rows, this is very tedious and takes a lot of time. Is there a faster way to come up with this final sorted table without having to spend a lot of run time looping?

 

Any help would be appreciated. I've spent quite a bit of time thinking about this, and just cannot think of a way to do it offhand.

 

Thanks!

Derek

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Tedious but straitforward programming: take the lowest time from the next lap available for each driver:

 

proc sql;
select count(distinct driver) into :dNb from have;
select count(distinct lap) into :lNb from have;
select distinct quote(trim(driver)) into :drList separated by "," from have;
quit;

data want;
array dLap{&dNb} _temporary_ (&dNb*1);
array dt {&dNb,&lNb} _temporary_;
do until(done);
    set have end=done;
    dId = whichc(driver, &drList.);
    dt{dId,lap} = time;
    end;
do until (missing (mId));
    mTime = constant("BIG");
    call missing(mId);
    do dId = 1 to &dNb;
        if dLap{dId} <= &lNb then if dt{dId, dLap{dId}} < mTime then do;
            mId = dId;
            mTime = dt{dId, dLap{dId}};
            end;
        end;
    if mId then do;
        driver = choosec(mId, &drList.);
        lap = dlap{mId};
        time = dt{mId, dLap{mId}};
        output;
        dLap{mId} = dLap{mId} + 1;
        end;
    end;
drop dId mTime mId;
run;

proc print; run;
PG

View solution in original post

2 REPLIES 2
PGStats
Opal | Level 21

Tedious but straitforward programming: take the lowest time from the next lap available for each driver:

 

proc sql;
select count(distinct driver) into :dNb from have;
select count(distinct lap) into :lNb from have;
select distinct quote(trim(driver)) into :drList separated by "," from have;
quit;

data want;
array dLap{&dNb} _temporary_ (&dNb*1);
array dt {&dNb,&lNb} _temporary_;
do until(done);
    set have end=done;
    dId = whichc(driver, &drList.);
    dt{dId,lap} = time;
    end;
do until (missing (mId));
    mTime = constant("BIG");
    call missing(mId);
    do dId = 1 to &dNb;
        if dLap{dId} <= &lNb then if dt{dId, dLap{dId}} < mTime then do;
            mId = dId;
            mTime = dt{dId, dLap{dId}};
            end;
        end;
    if mId then do;
        driver = choosec(mId, &drList.);
        lap = dlap{mId};
        time = dt{mId, dLap{mId}};
        output;
        dLap{mId} = dLap{mId} + 1;
        end;
    end;
drop dId mTime mId;
run;

proc print; run;
PG
derekcra
Fluorite | Level 6

This is great! I'll test it against some of my bigger datasets soon, but I'm pretty sure you've just saved me hours of runtime. Looks like it does exactly what I want. Thank you so much!

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 758 views
  • 2 likes
  • 2 in conversation