Hi Everyone,
My data has date, time, ID and value.
For each row, say date=1, time =12 and ID = 1, I want to find the 1st highest, 2nd highest and 3rd highest values for that id, of that date, and up to that time (<12).
The illustration of that row at the end.
Can you please help me?
Thank you,
HHC
data have;
input date time ID value;
cards;
0 1 1 88888
0 1 9 585
1 1 1 999
1 2 1 7
1 3 1 3
1 4 1 101
1 7 1 102
1 9 1 12
1 12 1 9999
1 13 1 105
1 14 1 1
2 12 1 5
2 13 1 8
1 2 124
1 2 11
1 2 9999
1 2 9
;
data temp; set have;
if id=1 and date=1 and time<12 ;run;
proc sort data=temp; by id descending value;run;
*pick to 3;
data temp; set temp;
if _N_<=3;
run;
*spread horizontally;
proc transpose data=temp out=want (drop = _NAME_);
by date id;
var value;run;
data want; set want;
rename col1=top1 col2=top2 col3=top3;
time=12;run;
My first thought about this is to use an ordered hash object, but actually you can more neatly combine a RETAIN statement with a CALL SORTN:
data have;
input date time ID value ;
cards;
0 1 1 88888
0 1 9 585
1 1 1 999
1 2 1 7
1 3 1 3
1 4 1 101
1 7 1 102
1 9 1 12
1 12 1 9999
1 13 1 105
1 14 1 1
2 12 1 5
2 13 1 8
1 2 124
1 2 11
1 2 9999
1 2 9
run;
proc sort ;
by id date time ;
run;
data want (drop=_:);
set have;
by id date;
retain topvalue1-topvalue3 _dummy ;
array sortvars _dummy topvalue3 topvalue2 topvalue1 ;
if first.date then call missing(of sortvars{*});
call sortn(of sortvars{*});
output;
_dummy=value;
run;
Reset the array sortvars (containing a history of VALUEs) to missing at the start of each date.
The call sortn routine will put the highest value in that history at the rightmost element of the sortvars array (i.e. TOPVALUE1). That's why the last three elements of sortvars array are in "reverse" order.
After sorting, output the observations. Only after that, replace the 4th largest historic value (i.e. variable _DUMMY) If the new value will be part of the top 3, that will be determined by the call sortn during the next observation.
I list the variables in the RETAIN statement in desired order, and place the RETAIN prior to the ARRAY statement (in reverse order). The variable order of the first statement will prevail in the output dataset, even though the call sortn uses them in another order.
My first thought about this is to use an ordered hash object, but actually you can more neatly combine a RETAIN statement with a CALL SORTN:
data have;
input date time ID value ;
cards;
0 1 1 88888
0 1 9 585
1 1 1 999
1 2 1 7
1 3 1 3
1 4 1 101
1 7 1 102
1 9 1 12
1 12 1 9999
1 13 1 105
1 14 1 1
2 12 1 5
2 13 1 8
1 2 124
1 2 11
1 2 9999
1 2 9
run;
proc sort ;
by id date time ;
run;
data want (drop=_:);
set have;
by id date;
retain topvalue1-topvalue3 _dummy ;
array sortvars _dummy topvalue3 topvalue2 topvalue1 ;
if first.date then call missing(of sortvars{*});
call sortn(of sortvars{*});
output;
_dummy=value;
run;
Reset the array sortvars (containing a history of VALUEs) to missing at the start of each date.
The call sortn routine will put the highest value in that history at the rightmost element of the sortvars array (i.e. TOPVALUE1). That's why the last three elements of sortvars array are in "reverse" order.
After sorting, output the observations. Only after that, replace the 4th largest historic value (i.e. variable _DUMMY) If the new value will be part of the top 3, that will be determined by the call sortn during the next observation.
I list the variables in the RETAIN statement in desired order, and place the RETAIN prior to the ARRAY statement (in reverse order). The variable order of the first statement will prevail in the output dataset, even though the call sortn uses them in another order.
Didn't you ask this question last week?
data have;
input date time ID value;
cards;
0 1 1 88888
0 1 9 585
1 1 1 999
1 2 1 7
1 3 1 3
1 4 1 101
1 7 1 102
1 9 1 12
1 12 1 9999
1 13 1 105
1 14 1 1
2 12 1 5
2 13 1 8
1 2 124 .
1 2 11 .
1 2 9999 .
1 2 9 .
;
proc sort;
by date time;
run;
data want ;
set have;
by date time;
array history [0:99] _temporary_;
array big largest1-largest5 ;
if first.date then call missing(of row history[*] big[*]);
row+1;
history[mod(row,100)]=value;
do index=1 to dim(big);
big[index]=largest(index,of history[*]);
end;
drop index;
run;
proc print;
run;
Obs date time ID value largest1 largest2 largest3 largest4 largest5 row 1 0 1 1 88888 88888 . . . . 1 2 0 1 9 585 88888 585 . . . 2 3 1 1 1 999 999 . . . . 1 4 1 2 1 7 999 7 . . . 2 5 1 2 124 . 999 7 . . . 3 6 1 2 11 . 999 7 . . . 4 7 1 2 9999 . 999 7 . . . 5 8 1 2 9 . 999 7 . . . 6 9 1 3 1 3 999 7 3 . . 7 10 1 4 1 101 999 101 7 3 . 8 11 1 7 1 102 999 102 101 7 3 9 12 1 9 1 12 999 102 101 12 7 10 13 1 12 1 9999 9999 999 102 101 12 11 14 1 13 1 105 9999 999 105 102 101 12 15 1 14 1 1 9999 999 105 102 101 13 16 2 12 1 5 5 . . . . 1 17 2 13 1 8 8 5 . . . 2
@hhchenfx , it seems that you've asked this quested 2 times before, but with slightly different input data.
In this version of the question, it seems that the difference is that the date variable is not strictly increasing for each observation? Am I correct to assume that your requirement is the same as previous, but that you want to maintain the original order?
Please be very specific. Makes it much easier to provide an accurate answer 🙂
Thank you all for helping.
These are amazing code.
HHC
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: