Hi,
My data has id, date and value.
For each date (current date), I lookback 4 days and check if current date's value belong to top 1 or top 2. The top position is also kept.
My code below works, but I would like to have a more efficient code.
Can you please help?
Thanks,
HHC
data have;
input ID date value ;
cards;
1 1 5
1 2 8
1 3 20
1 4 0
1 5 1
1 6 10
1 7 5
2 1 6
2 2 10
2 3 10
2 4 9
2 5 1
2 6 5
2 7 33
run;
*get prior 4 days and current date;
proc sql;
create table want as select a.id,a.date, b.date as date_2, b.value
from have as a left join have as b
on a.id=b.id and a.date-4<=b.date<=a.date
order by a.id,a.date, b.date;
quit;
proc sort data=want; by id date descending value;run;
*top 2 values;
data want2; set want;
by id date;
if first.date then top=0;
top+1;
if top<=2 then output;
run;
*check if current date and top value date is the same;
data final; set want2;
if date=date_2; run;
proc sort data=final; by id date top;run;
Not sure about performance of this solution, the largest function may be resource intense.
EDIT: tweaked indexes to match results as it's current 5 observations, not 4 observations.
data temp;
set have;
by id;
array _prev(0:4) _Temporary_;
if first.id then do;
call missing (of _prev(*));
end;
*add to array;
_prev(mod(_n_, 5)) = value;
if largest(1, of _prev(*)) =value then flag=1;
if largest(2, of _prev(*)) = value then flag=2;
run;
data final3;
set temp;
if flag in (1,2) ;
run;
This does it in one step, but handles ties for first place differently:
data want;
set have;
by id;
array last4 {4} _temporary_;
array last4_2 {4} _temporary_;
if first.id
then do;
count = 1;
call missing(of last4{*});
end;
else count + 1;
last4{mod(count,4)+1} = value;
do i = 1 to 4;
last4_2{i} = last4{i};
end;
call sort(of last4_2{*});
if value = last4_2{4}
then do;
top = 1;
output;
end;
else if value = last4_2{3}
then do;
top = 2;
output;
end;
drop count;
run;
Same idea with Kurt.
/*
That would be easy if there is no gap between two date.
*/
data have;
input ID date value ;
cards;
1 1 5
1 2 8
1 3 20
1 4 0
1 5 1
1 6 10
1 7 5
2 1 6
2 2 10
2 3 10
2 4 9
2 5 1
2 6 5
2 7 33
;
data want;
array x{5} _temporary_;
array y{5} _temporary_;
call missing(of x{*} y{*});
do i=1 by 1 until(last.id);
set have;
by id;
x{mod(i,dim(x))+1}=value;
do j=1 to dim(x);
y{j}=x{j};
end;
call sortn(of y{*});
top=dim(x)+1-whichn(value,of y{*});
output;
end;
drop i j;
run;
I try another method and look quite short.
I will run a big sample and see which one is faster.
Many thanks,
HHC
data want;
set have nobs=nobs;
by id;
i+1;
out=0;
top=1;
do j=i to i+5 until (out=1);
set have (rename =( id=X_id date=X_date value=X_value)) point=j;
if id^=X_id then out=1;
if value<X_value then top=top+1;
if top=3 then do; exit=X_date;out=1;end;
end;
run;
Not sure about performance of this solution, the largest function may be resource intense.
EDIT: tweaked indexes to match results as it's current 5 observations, not 4 observations.
data temp;
set have;
by id;
array _prev(0:4) _Temporary_;
if first.id then do;
call missing (of _prev(*));
end;
*add to array;
_prev(mod(_n_, 5)) = value;
if largest(1, of _prev(*)) =value then flag=1;
if largest(2, of _prev(*)) = value then flag=2;
run;
data final3;
set temp;
if flag in (1,2) ;
run;
<pedantic mode: on>
Top? that means above. So processing data from "top to bottom" in the shown code means the first two values are the "top".
<pedantic mode: off>
I really hate a value judgement like "top" being used as programming requirement. "Top" could mean the lowest score (golf for example lowest value wins or is better. If you mean "Largest value" say so. Also if you mean to do something within a group of values, like ID, please state so.
Hi Everyone,
Thank you so much for your support!
So I run on 2 samples (1million row and 5 million rows) and lookback window is 1000 and below are the result.
Somehow, the code by Kurt_Bremser return error with call sort so I can't check.
I believe that data structure matters in terms of time needed to run and the 1000 lookback is quite a stress test.
(With lookback =5, there is not much difference among methods.)
1 milion:
Reeze: 3.04 sec
Ksharp: 1:19 min
me: 5.98 sec
5 milion:
Reeze: 9.7 sec
Ksharp: more than 2 min
me: 12.6
Thank you,
HHC
data want;
set have nobs=nobs;
by id;
drop out i j X_: out;
i+1;
out=0;
top_value&look_back=1;
do j=i+1 to i+ &look_back until (out=1);
set have (keep = id value change_id rename =( id=X_id value=X_value change_id=X_change_id)) point=j;
if change_id=1 then do; out=1;;leave;end;
if X_change_id=1 then do; out=1;;leave;end;
else
if value<X_value then do; top_value&look_back=top_value&look_back+1; max_value=X_value;end;
if top_value&look_back =&cutoff_top_value then do;top_value&look_back=0; out=1; ;end;
end;
run;
Final code
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.