Solved: Re: rank dates within cases

Nia2023 · Posted 03-14-2023 11:59 AM

I ran into a challenge, hope experts can advise.

In a data set, we have case IDs (1, 2, etc), and the next column is dates of accident:

1.00 06/04/2008
1.00 06/08/2008
2.00 06/14/2008
3.00 08/23/2008
3.00 08/23/2008
4.00 08/23/2008

Here, if the date is the same withing the same ID, it is the same accident. So for ID=3, it will be the same accident, just 2 records for different aspects of the accident.

We need to assign a number to each unique accident. So for ID=1, it will be 1 and 2. For ID=2, will be 1. For ID=3, 1 and 2. And for ID=4, 1.

The issue is that when I use a rank function to rank dates by ID, the result is weird. For some cases, the rank is 11.5, 23.5 etc.

Is there a way to make it work so that for each case ID it will rank order just the dates within that ID?

Tom · Posted 03-14-2023 01:27 PM

Did you read the documentation of PROC RANK?

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/p16s2o8e4bnqrin1phywxdaxqba7.htm#n128utp...

TIES=HIGH | LOW | MEAN | DENSE

specifies how to compute normal scores or ranks for tied data values.

HIGH

assigns the largest of the corresponding ranks (or largest of the normal scores when NORMAL= is specified).

LOW

assigns the smallest of the corresponding ranks (or smallest of the normal scores when NORMAL= is specified).

MEAN

assigns the mean of the corresponding rank (or mean of the normal scores when NORMAL= is specified).

DENSE

computes scores and ranks by treating tied values as a single-order statistic. For the default method, ranks are consecutive integers that begin with the number one and end with the number of unique, nonmissing values of the variable that is being ranked. Tied values are assigned the same rank.

Note: CONDENSE is an alias for DENSE.

View solution in original post

Tom · Posted 03-14-2023 12:09 PM

Just a simple numbering of the cases should work. Just increment the number every time you see a new date.

data want;
  set have ;
  by id date;
  if first.id then casenum=0;
  casenum + first.date ;
run;

ballardw · Posted 03-14-2023 12:32 PM

It sounds like you used proc rank which will deal with ties in a number of ways depending on the options chosen. The default is MEAN which would tend to generate values with a .5 when there are two identical dates. One suspect that you did not use a BY ID for the range of values mention.

Perhaps: May need to sort by ID and date if not actually so in your data.

data have;
   input id date :mmddyy10.;
   format date mmddyy10.;
datalines;
1.00 06/04/2008
1.00 06/08/2008
2.00 06/14/2008
3.00 08/23/2008
3.00 08/23/2008
4.00 08/23/2008
;

proc rank data=have out=ranked ties=low;
   by id;
   ranks daterank;
   var date ;
run;

Note your description "So for ID=3, it will be the same accident, just 2 records for different aspects of the accident.

.... For ID=3, 1 and 2." seems inconsistent. Wouldn't "same accident" be the same rank?

Nia2023 · Posted 03-14-2023 01:23 PM

Thank you! Yes, you are right - for ID=3, should be 1 for both dates.

I tried your solution. It gets rid of .5 values. However, it does not assign the values consecutively, which is evident when the number of accidents is higher:

ID date rank

10.00 09/19/2006 1.00
10.00 09/19/2006 1.00
10.00 10/17/2006 5.00
10.00 10/17/2006 5.00
10.00 10/30/2006 8.00
10.00 10/30/2006 8.00
10.00 11/06/2006 11.00
10.00 11/06/2006 11.00
10.00 12/13/2006 13.00
10.00 12/13/2006 13.00

Instead, we need:

ID date rank

10.00 09/19/2006 1.00
10.00 09/19/2006 1.00
10.00 10/17/2006 2.00
10.00 10/17/2006 2.00
10.00 10/30/2006 3.00
10.00 10/30/2006 3.00

10.00 10/30/2006 3.00
10.00 11/06/2006 4.00
10.00 11/06/2006 4.00
10.00 12/13/2006 5.00
10.00 12/13/2006 5.00

So for ID=10 there were 5 distinct accidents in total, and should be numbered 1-5, not 1-13.

Is there a way to do that?

Tom · Posted 03-14-2023 01:27 PM

Did you read the documentation of PROC RANK?

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/p16s2o8e4bnqrin1phywxdaxqba7.htm#n128utp...

TIES=HIGH | LOW | MEAN | DENSE

specifies how to compute normal scores or ranks for tied data values.

HIGH

assigns the largest of the corresponding ranks (or largest of the normal scores when NORMAL= is specified).

LOW

assigns the smallest of the corresponding ranks (or smallest of the normal scores when NORMAL= is specified).

MEAN

assigns the mean of the corresponding rank (or mean of the normal scores when NORMAL= is specified).

DENSE

computes scores and ranks by treating tied values as a single-order statistic. For the default method, ranks are consecutive integers that begin with the number one and end with the number of unique, nonmissing values of the variable that is being ranked. Tied values are assigned the same rank.

Note: CONDENSE is an alias for DENSE.

Nia2023 · Posted 03-14-2023 02:02 PM

It did work with the Dense option produced what we wanted.

thank you so much, I greatly appreciate your help!!!

ballardw · Posted 03-14-2023 02:16 PM

@Nia2023 wrote:

Thank you! Yes, you are right - for ID=3, should be 1 for both dates.

I tried your solution. It gets rid of .5 values. However, it does not assign the values consecutively, which is evident when the number of accidents is higher:

ID date rank

10.00 09/19/2006 1.00
10.00 09/19/2006 1.00
10.00 10/17/2006 5.00
10.00 10/17/2006 5.00
10.00 10/30/2006 8.00
10.00 10/30/2006 8.00
10.00 11/06/2006 11.00
10.00 11/06/2006 11.00
10.00 12/13/2006 13.00
10.00 12/13/2006 13.00

Instead, we need:

ID date rank

10.00 09/19/2006 1.00
10.00 09/19/2006 1.00
10.00 10/17/2006 2.00
10.00 10/17/2006 2.00
10.00 10/30/2006 3.00
10.00 10/30/2006 3.00

10.00 10/30/2006 3.00
10.00 11/06/2006 4.00
10.00 11/06/2006 4.00
10.00 12/13/2006 5.00
10.00 12/13/2006 5.00

So for ID=10 there were 5 distinct accidents in total, and should be numbered 1-5, not 1-13.

Is there a way to do that?

Do please compare your first example with this data as well. You can oversimplify as in this case. Also, any time you get a result that doesn't match your expectations then show the CODE that you use as well as data.

TIES=HIGH | LOW | MEAN | DENSE

HIGH

LOW

MEAN

DENSE

TIES=HIGH | LOW | MEAN | DENSE

HIGH

LOW

MEAN

DENSE

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away