Solved: Re: SAS Conditional Rank - Adjusting During Sort

derekcra · Posted 05-08-2019 01:02 PM

Hello all -

I have a dataset with ~1,000,000 rows. As an example, here's what the data looks like (data is fake, but will describe the issue I'm facing).

data TABLE_HAVE;
	infile datalines truncover dlm='|' dsd;
	format
		driver				$10.
		lap				1.
		time				10.2
		;
	input
		driver:				$10.
		lap:				1.
		time:				10.2
		;
datalines;
Driver A|1|44.93
Driver A|2|46.23
Driver A|3|45.06
Driver A|4|43.46
Driver B|1|44.05
Driver B|2|44.17
Driver B|3|46.42
Driver B|4|43.92
Driver C|1|46.19
Driver C|2|46.94
Driver C|3|46.31
Driver C|4|46.76
Driver D|1|42.17
Driver D|2|45.89
Driver D|3|46.37
Driver D|4|42.87
run;

data TABLE_WANT;
	infile datalines truncover dlm='|' dsd;
	format
		driver				$10.
		lap				1.
		time				10.2
		;
	input
		driver:				$10.
		lap:				1.
		time:				10.2
		;
datalines;
Driver D|1|42.17
Driver B|1|44.05
Driver B|2|44.17
Driver A|1|44.93
Driver D|2|45.89
Driver C|1|46.19
Driver A|2|46.23
Driver A|3|45.06
Driver A|4|43.46
Driver D|3|46.37
Driver D|4|42.87
Driver B|3|46.42
Driver B|4|43.92
Driver C|2|46.94
Driver C|3|46.31
Driver C|4|46.76
run;

What I have is three drivers and their four lap times. What I need my final product to be is a sorted version of the best lap times. However, the important condition is that the best time must also keep lap order in mind. For instance, the first record will be the fastest lap 1 time. However, after that, future records will be the fastest lap time for the earliest lap for each driver.

Right now I have a macro that basically sorts it by lap, picks the fastest time, and then subtracts 1 from each lap for the time/driver chosen. However, for 1 million rows, this is very tedious and takes a lot of time. Is there a faster way to come up with this final sorted table without having to spend a lot of run time looping?

Any help would be appreciated. I've spent quite a bit of time thinking about this, and just cannot think of a way to do it offhand.

Thanks!

Derek

PGStats · Posted 05-08-2019 05:29 PM

Tedious but straitforward programming: take the lowest time from the next lap available for each driver:

proc sql;
select count(distinct driver) into :dNb from have;
select count(distinct lap) into :lNb from have;
select distinct quote(trim(driver)) into :drList separated by "," from have;
quit;

data want;
array dLap{&dNb} _temporary_ (&dNb*1);
array dt {&dNb,&lNb} _temporary_;
do until(done);
    set have end=done;
    dId = whichc(driver, &drList.);
    dt{dId,lap} = time;
    end;
do until (missing (mId));
    mTime = constant("BIG");
    call missing(mId);
    do dId = 1 to &dNb;
        if dLap{dId} <= &lNb then if dt{dId, dLap{dId}} < mTime then do;
            mId = dId;
            mTime = dt{dId, dLap{dId}};
            end;
        end;
    if mId then do;
        driver = choosec(mId, &drList.);
        lap = dlap{mId};
        time = dt{mId, dLap{mId}};
        output;
        dLap{mId} = dLap{mId} + 1;
        end;
    end;
drop dId mTime mId;
run;

proc print; run;

PG

View solution in original post

PGStats · Posted 05-08-2019 05:29 PM

Tedious but straitforward programming: take the lowest time from the next lap available for each driver:

proc sql;
select count(distinct driver) into :dNb from have;
select count(distinct lap) into :lNb from have;
select distinct quote(trim(driver)) into :drList separated by "," from have;
quit;

data want;
array dLap{&dNb} _temporary_ (&dNb*1);
array dt {&dNb,&lNb} _temporary_;
do until(done);
    set have end=done;
    dId = whichc(driver, &drList.);
    dt{dId,lap} = time;
    end;
do until (missing (mId));
    mTime = constant("BIG");
    call missing(mId);
    do dId = 1 to &dNb;
        if dLap{dId} <= &lNb then if dt{dId, dLap{dId}} < mTime then do;
            mId = dId;
            mTime = dt{dId, dLap{dId}};
            end;
        end;
    if mId then do;
        driver = choosec(mId, &drList.);
        lap = dlap{mId};
        time = dt{mId, dLap{mId}};
        output;
        dLap{mId} = dLap{mId} + 1;
        end;
    end;
drop dId mTime mId;
run;

proc print; run;

PG

derekcra · Posted 05-08-2019 05:47 PM

This is great! I'll test it against some of my bigger datasets soon, but I'm pretty sure you've just saved me hours of runtime. Looks like it does exactly what I want. Thank you so much!

SAS Conditional Rank - Adjusting During Sort

Re: SAS Conditional Rank - Adjusting During Sort

Re: SAS Conditional Rank - Adjusting During Sort

Re: SAS Conditional Rank - Adjusting During Sort

SAS Innovate 2025: Save the Date