Hi Everyone,
So I have a data with ID, date (not continuous) and value.
For each row, I create a look back, which is 4 date before.
I want to find the max value within that 4 date (excluding current date)
My code below gives me the result but I need to run a sort nodupkey.
Can you please help me fix it?
Thanks,
HHC
data have;
input id date value;
datalines;
1 1 3
1 4 7
1 15 3
1 25 20
1 26 8
1 27 15
1 28 5
1 29 5
1 30 5
1 31 6
;
run;
data have; set have;
lookback_date=date - 4;
run;
proc sql;
create table want as select a.*, max(b.value) as max from have as a left join have as b
on a.id=b.id and a.lookback_date<=b.date and a.date>b.date
group by a.id, a.date;
quit;
proc sort data =want out=want nodupkey; by id date; run;
Is this returning what you're after?
data want;
set have;
by id date;
array values{0:3} 8 _temporary_;
if first.id then call missing(of values[*]);
else max_val=max(of values[*]);
values[mod(_n_,4)]=value;
run;
proc print data=want;
run;
/*If you don't have a big table, you could try this sub-query.*/
data have;
input id date value;
datalines;
1 1 3
1 4 7
1 15 3
1 25 20
1 26 8
1 27 15
1 28 5
1 29 5
1 30 5
1 31 6
;
run;
proc sql;
create table want2 as
select *,(select max(value) from have where id=a.id and a.date-4<=date<a.date) as max
from have as a;
quit;
Hi @hhchenfx,
Here is a DATA step solution:
%let w=4; /* length of time window (in days) */
data want(drop=i j);
set have;
by id date;
array d[0:&w] _temporary_; /* current and last four dates */
array v[0:&w] _temporary_; /* current and last four values */
if first.id then call missing(of d[*], of v[*]);
d[mod(_n_,&w+1)]=date;
v[mod(_n_,&w+1)]=value;
max=.;
do i=1 to &w;
j=mod(_n_+&w+1-i,&w+1);
if date-&w>d[j] then leave;
if v[j]>max then max=v[j];
end;
run;
It assumes that the input dataset (HAVE) is sorted by ID DATE and that there are no duplicate dates within an ID.
Is this returning what you're after?
data want;
set have;
by id date;
array values{0:3} 8 _temporary_;
if first.id then call missing(of values[*]);
else max_val=max(of values[*]);
values[mod(_n_,4)]=value;
run;
proc print data=want;
run;
%let start = %sysfunc(inputn(2010-01-01,yymmdd10.));
%let end = %sysfunc(inputn(2024-12-31,yymmdd10.));
data want;
set have;
by id;
array v{&start.:&end.} _temporary_;
if first.id
then do i = &start. to &end.;
v{i} = .;
end;
v{date} = value;
max = max(v{date-4},v{date-3},v{date-2},v{date-1});
drop i;
run;
Always store dates as SAS dates!
@hhchenfx: Have you changed the requirements? I'm asking because four solutions were suggested, only three of which, applied to your sample data, produce the results in your WANT dataset (in the case of Kurt Bremser's code after adapting the start date), but you have accepted the one that ignores gaps in the sequence of dates.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.