Solved: Find the N biggest numbers in a day up to current time

hhchenfx · Posted 11-28-2022 04:05 PM

Hi Everyone,

My data has date, time, ID and value.

For each row, say date=1, time =12 and ID = 1, I want to find the 1st highest, 2nd highest and 3rd highest values for that id, of that date, and up to that time (<12).

The illustration of that row at the end.

Can you please help me?

Thank you,

HHC

data have;
  input date time ID value;
cards;
0 1 1 88888
0 1 9 585
1 1 1 999
1 2 1 7
1 3 1 3
1 4 1 101
1 7 1 102
1 9 1 12
1 12 1 9999
1 13 1 105
1 14 1 1
2 12 1 5
2 13 1 8 
1 2 124
1 2 11
1 2 9999
1 2 9
;


data temp; set have;
if id=1 and date=1 and       time<12 ;run;

proc sort data=temp; by id descending value;run;

*pick to 3;
data temp; set temp;
if _N_<=3;
run;

*spread horizontally;
proc transpose data=temp out=want (drop = _NAME_);
by date id;
var value;run;

data want; set want;
rename col1=top1 col2=top2 col3=top3;
time=12;run;

mkeintz · Posted 11-28-2022 08:44 PM

My first thought about this is to use an ordered hash object, but actually you can more neatly combine a RETAIN statement with a CALL SORTN:

data have;
  input date time ID value  ;
cards;
0 1 1 88888
0 1 9 585  
1 1 1 999  
1 2 1 7    
1 3 1 3    
1 4 1 101  
1 7 1 102  
1 9 1 12   
1 12 1 9999
1 13 1 105 
1 14 1 1   
2 12 1 5   
2 13 1 8   
1 2 124    
1 2 11     
1 2 9999   
1 2 9      
run;

proc sort ; 
  by id date time ;
run;

data want (drop=_:);
  set have;
  by id date;
  retain topvalue1-topvalue3  _dummy ;

  array sortvars _dummy topvalue3 topvalue2 topvalue1 ;

  if first.date then call missing(of sortvars{*});
  call sortn(of sortvars{*});
  output;
  _dummy=value;
run;

Reset the array sortvars (containing a history of VALUEs) to missing at the start of each date.

The call sortn routine will put the highest value in that history at the rightmost element of the sortvars array (i.e. TOPVALUE1). That's why the last three elements of sortvars array are in "reverse" order.

After sorting, output the observations. Only after that, replace the 4th largest historic value (i.e. variable _DUMMY) If the new value will be part of the top 3, that will be determined by the call sortn during the next observation.

I list the variables in the RETAIN statement in desired order, and place the RETAIN prior to the ARRAY statement (in reverse order). The variable order of the first statement will prevail in the output dataset, even though the call sortn uses them in another order.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

mkeintz · Posted 11-28-2022 08:44 PM

My first thought about this is to use an ordered hash object, but actually you can more neatly combine a RETAIN statement with a CALL SORTN:

data have;
  input date time ID value  ;
cards;
0 1 1 88888
0 1 9 585  
1 1 1 999  
1 2 1 7    
1 3 1 3    
1 4 1 101  
1 7 1 102  
1 9 1 12   
1 12 1 9999
1 13 1 105 
1 14 1 1   
2 12 1 5   
2 13 1 8   
1 2 124    
1 2 11     
1 2 9999   
1 2 9      
run;

proc sort ; 
  by id date time ;
run;

data want (drop=_:);
  set have;
  by id date;
  retain topvalue1-topvalue3  _dummy ;

  array sortvars _dummy topvalue3 topvalue2 topvalue1 ;

  if first.date then call missing(of sortvars{*});
  call sortn(of sortvars{*});
  output;
  _dummy=value;
run;

Reset the array sortvars (containing a history of VALUEs) to missing at the start of each date.

The call sortn routine will put the highest value in that history at the rightmost element of the sortvars array (i.e. TOPVALUE1). That's why the last three elements of sortvars array are in "reverse" order.

After sorting, output the observations. Only after that, replace the 4th largest historic value (i.e. variable _DUMMY) If the new value will be part of the top 3, that will be determined by the call sortn during the next observation.

I list the variables in the RETAIN statement in desired order, and place the RETAIN prior to the ARRAY statement (in reverse order). The variable order of the first statement will prevail in the output dataset, even though the call sortn uses them in another order.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Tom · Posted 11-28-2022 11:29 PM

Didn't you ask this question last week?

data have;
  input date time ID value;
cards;
0 1 1 88888
0 1 9 585
1 1 1 999
1 2 1 7
1 3 1 3
1 4 1 101
1 7 1 102
1 9 1 12
1 12 1 9999
1 13 1 105
1 14 1 1
2 12 1 5
2 13 1 8 
1 2 124 .
1 2 11 .
1 2 9999 .
1 2 9 .
;

proc sort;
  by date time;
run;

data want ;
  set have;
  by date time;
  array history [0:99] _temporary_;
  array big largest1-largest5 ;
  if first.date then call missing(of row history[*] big[*]);
  row+1;
  history[mod(row,100)]=value;
  do index=1 to dim(big);
    big[index]=largest(index,of history[*]);
  end;
  drop index;
run;

proc print;
run;

Obs    date    time      ID    value    largest1    largest2    largest3    largest4    largest5    row

  1      0       1        1    88888      88888          .           .           .           .        1
  2      0       1        9      585      88888        585           .           .           .        2
  3      1       1        1      999        999          .           .           .           .        1
  4      1       2        1        7        999          7           .           .           .        2
  5      1       2      124        .        999          7           .           .           .        3
  6      1       2       11        .        999          7           .           .           .        4
  7      1       2     9999        .        999          7           .           .           .        5
  8      1       2        9        .        999          7           .           .           .        6
  9      1       3        1        3        999          7           3           .           .        7
 10      1       4        1      101        999        101           7           3           .        8
 11      1       7        1      102        999        102         101           7           3        9
 12      1       9        1       12        999        102         101          12           7       10
 13      1      12        1     9999       9999        999         102         101          12       11
 14      1      13        1      105       9999        999         105         102         101       12
 15      1      14        1        1       9999        999         105         102         101       13
 16      2      12        1        5          5          .           .           .           .        1
 17      2      13        1        8          8          5           .           .           .        2

PeterClemmensen · Posted 11-29-2022 05:31 AM

@hhchenfx , it seems that you've asked this quested 2 times before, but with slightly different input data.

In this version of the question, it seems that the difference is that the date variable is not strictly increasing for each observation? Am I correct to assume that your requirement is the same as previous, but that you want to maintain the original order?

Please be very specific. Makes it much easier to provide an accurate answer 🙂

The DATA to DATA Step Macro
Blog: SASnrd

hhchenfx · Posted 11-29-2022 05:05 PM

Thank you all for helping.

These are amazing code.

HHC

hhchenfx · Posted 11-29-2022 05:07 PM

@PeterClemmensen , thanks for reminding me.

I will try to make it as clear as I can.

Thanks,

HHC

Find the N biggest numbers in a day up to current time

Re: Find the N biggest numbers in a day up to current time

Re: Find the N biggest numbers in a day up to current time

Re: Find the N biggest numbers in a day up to current time

Re: Find the N biggest numbers in a day up to current time

Re: Find the N biggest numbers in a day up to current time

Re: Find the N biggest numbers in a day up to current time

Classroom Training Available!