BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
hhchenfx
Barite | Level 11

Hi,

My data has id, date and value.

For each date (current date), I lookback 4 days and check if current date's value belong to top 1 or top 2. The top position is also kept.

My code below works, but I would like to have a more efficient code.

Can you please help?

Thanks,

HHC

data have;
  input  ID date value  ;
cards;
1 1 5
1 2 8
1 3 20
1 4 0
1 5 1
1 6 10
1 7 5
2 1 6
2 2 10
2 3 10
2 4 9
2 5 1 
2 6 5
2 7 33
run;

*get prior 4 days and current date;
proc sql;
create table want as select a.id,a.date, b.date as date_2, b.value
from have as a left join have as b 
on a.id=b.id and a.date-4<=b.date<=a.date
order by a.id,a.date, b.date;
quit;

proc sort data=want; by id date descending value;run;

*top 2 values;
data want2; set want;
	by id date;
	if first.date then top=0;
	top+1;
	if top<=2 then output;
	run;

*check if current date and top value date is the same;
data final; set want2;
if date=date_2; run;

proc sort data=final; by id date top;run;
1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Not sure about performance of this solution, the largest function may be resource intense.

EDIT: tweaked indexes to match results as it's current 5 observations, not 4 observations.

 

data temp;
set have;
by id;
array _prev(0:4) _Temporary_;


if first.id then do;
    call missing (of _prev(*));
end;

*add to array;
_prev(mod(_n_, 5)) = value;

if largest(1, of _prev(*)) =value then flag=1;
if largest(2, of _prev(*)) = value then flag=2;


run;

data final3;
set temp;
if flag in (1,2) ;
run; 

 

View solution in original post

8 REPLIES 8
Kurt_Bremser
Super User

This does it in one step, but handles ties for first place differently:

data want;
set have;
by id;
array last4 {4} _temporary_;
array last4_2 {4} _temporary_;
if first.id
then do;
  count = 1;
  call missing(of last4{*});
end;
else count + 1;
last4{mod(count,4)+1} = value;
do i = 1 to 4;
  last4_2{i} = last4{i};
end;
call sort(of last4_2{*});
if value = last4_2{4}
then do;
  top = 1;
  output;
end;
else if value = last4_2{3}
then do;
  top = 2;
  output;
end;
drop count;
run;
Ksharp
Super User

Same idea with Kurt.

 

/*
That would be easy if there is no gap between two date.
*/
data have;
  input  ID date value  ;
cards;
1 1 5
1 2 8
1 3 20
1 4 0
1 5 1
1 6 10
1 7 5
2 1 6
2 2 10
2 3 10
2 4 9
2 5 1 
2 6 5
2 7 33
;
data want;
array x{5} _temporary_;
array y{5} _temporary_;
call missing(of x{*} y{*});
 do i=1 by 1 until(last.id);
  set have;
  by id;
  x{mod(i,dim(x))+1}=value;
  do j=1 to dim(x);
    y{j}=x{j};
  end;
  call sortn(of y{*});

  top=dim(x)+1-whichn(value,of y{*});
  output;
 end;
 drop i j;
run;
hhchenfx
Barite | Level 11

I try another method and look quite short.

I will run a big sample and see which one is faster.

Many thanks,

HHC

 

data want;
set have nobs=nobs;
by id;
i+1;
out=0;
top=1;
do j=i to i+5 until (out=1);
set have (rename =( id=X_id date=X_date value=X_value)) point=j;
	if id^=X_id then out=1;
	if value<X_value then top=top+1;
	if top=3 then do; exit=X_date;out=1;end;
end;
run;
Reeza
Super User

Not sure about performance of this solution, the largest function may be resource intense.

EDIT: tweaked indexes to match results as it's current 5 observations, not 4 observations.

 

data temp;
set have;
by id;
array _prev(0:4) _Temporary_;


if first.id then do;
    call missing (of _prev(*));
end;

*add to array;
_prev(mod(_n_, 5)) = value;

if largest(1, of _prev(*)) =value then flag=1;
if largest(2, of _prev(*)) = value then flag=2;


run;

data final3;
set temp;
if flag in (1,2) ;
run; 

 

ballardw
Super User

<pedantic mode: on>

Top? that means above. So processing data from "top to bottom" in the shown code means the first two values are the "top".

<pedantic mode: off>

 

I really hate a value judgement like "top" being used as programming requirement. "Top" could mean the lowest score (golf for example lowest value wins or is better.  If you mean "Largest value" say so. Also if you mean to do something within a group of values, like ID, please state so.

Ksharp
Super User
I think Reeza's code is most efficient. And my code didn't consider the TIE value , but Reeza care about it .
hhchenfx
Barite | Level 11

Hi Everyone,

Thank you so much for your support!

 

So I run on 2 samples (1million row and 5 million rows) and lookback window is 1000 and below are the result.
Somehow, the code by Kurt_Bremser return error with call sort so I can't check.

 

I believe that data structure matters in terms of time needed to run and the 1000 lookback is quite a stress test.
(With lookback =5, there is not much difference among methods.)

 

1 milion:
Reeze: 3.04 sec
Ksharp: 1:19 min
me: 5.98 sec

 

5 milion:
Reeze: 9.7 sec
Ksharp: more than 2 min
me: 12.6

 

Thank you,

HHC

 

hhchenfx
Barite | Level 11
data want;
set have nobs=nobs;
by id;
drop out i j X_: out;
i+1;
out=0;
top_value&look_back=1;

do j=i+1 to i+ &look_back until (out=1);

set have (keep = id value change_id rename =( id=X_id value=X_value change_id=X_change_id)) point=j;
	if change_id=1 then do; out=1;;leave;end;
	if X_change_id=1 then do; out=1;;leave;end;
	else
	if value<X_value then do; top_value&look_back=top_value&look_back+1; max_value=X_value;end;
	
	if top_value&look_back =&cutoff_top_value then do;top_value&look_back=0; out=1; ;end;


end;
run;

Final code

sas-innovate-2024.png

📢

ANNOUNCEMENT

The early bird rate has been extended! Register by March 18 for just $695 - $100 off the standard rate.

 

Check out the agenda and get ready for a jam-packed event featuring workshops, super demos, breakout sessions, roundtables, inspiring keynotes and incredible networking events. 

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 730 views
  • 12 likes
  • 5 in conversation