Hi Sas Community
Have data and want for every datapoint (by participant) to find maximum value (of measure) in previous 1 hour, minimum value in following 1 hour, and the area under the curve in the previous 1 hour. How do I do this?
Help will be really appreciate!
Data example:
participant_id datetime measure
A1 30OCT20:10:16:24 1.028037383
A1 30OCT20:13:14:14 1.183800623
A1 30OCT20:13:24:12 1.277258567
A1 30OCT20:14:29:14 1.464174455
A1 30OCT20:14:44:13 1.619937695
A1 30OCT20:14:49:02 1.744548287
A1 30OCT20:14:56:15 1.806853583
A2 20NOV20:14:22:18 2.492211838
A2 20NOV20:14:37:17 2.492211838
A2 20NOV20:14:52:19 2.242990654
A2 20NOV20:14:59:20 2.30529595
A2 20NOV20:15:22:22 2.274143302
A2 20NOV20:15:37:21 2.398753894
A2 20NOV20:15:52:23 2.367601246
A3 18NOV20:10:25:11 2.367601246
A3 18NOV20:10:30:13 2.46105919
A3 18NOV20:11:45:16 2.46105919
A3 18NOV20:11:55:12 2.429906542
A3 18NOV20:12:45:21 2.61682243
A3 18NOV20:12:50:23 2.398753894
A3 18NOV20:13:55:28 2.429906542
Hi @linlin87, it would be helpful if you provided an example of your desired output. It's hard to grasp what you mean by previous/following hour (previous/following relative to what reference point?). Also, what do you mean by area under the curve? Is it for the distribution of those values for that participant?
for each datapoint in the dataset, by participant_id
for each data point in data set, look over past hour and identify :
a) maximum value
b) area under curve
One way of getting the min and max. Still, please elaborate what you mean by "curve".
%let ref_dttm = 20NOV20:15:00:00;
%let Dttm_min_1hr = %sysfunc(intnx(DTHOUR,"&ref_dttm"dt,-1 ),datetime16.);
%let Dttm_plus_1hr = %sysfunc(intnx(DTHOUR,"&ref_dttm"dt,+1 ), datetime16.);
%put &Dttm_min_1hr
&Dttm_plus_1hr;
proc sql;
create table work.want as
select participant_id,
min(case when dttm > "&ref_dttm"dt then . else Measure end) as min_prev_hr,
max(case when dttm < "&ref_dttm"dt then . else Measure end) as max_next_hr
from work.have
where dttm between "&Dttm_min_1hr."dt and "&Dttm_plus_1hr"dt
group by participant_id
;
quit;
I would think the best approach would be to use PROC IML, then finding records in the one hour before or one hour after is relatively easy as IML allows you to "look ahead" or "look back" easily. Then area under the curve is also easy to compute in PROC IML, see https://blogs.sas.com/content/iml/2011/06/01/the-trapezoidal-rule-of-integration.html
you were asked to provide an example of the output, and you haven't provided that, you just re-stated the words of what you need. It would be really really really helpful if you provide an example of the output data set you want.
Thank you PaigeMiller and other forum member for help!
Here is what I need (area_under_curve I am just guess so fill with 1). I need the max in previous hour and area under curve for EVERY row in dataset.
participant_id datetime measure max_prev_hour area_under_curve A1 30OCT20:10:16:24 1.028 1.028 1 A1 30OCT20:13:14:14 1.183 1.183 1 A1 30OCT20:13:24:12 1.277 1.277 1 A1 30OCT20:14:29:14 1.464 1.464 1 A1 30OCT20:14:44:13 1.619 1.619 1 A1 30OCT20:14:49:02 1.744 1.744 1 A1 30OCT20:14:56:15 1.806 1.806 1 A2 20NOV20:14:22:18 2.492 2.492 1 A2 20NOV20:14:37:17 2.492 2.492 1 A2 20NOV20:14:52:19 2.242 2.492 1 A2 20NOV20:14:59:20 2.305 2.305 1 A2 20NOV20:15:22:22 2.274 2.492 1 A2 20NOV20:15:37:21 2.398 2.398 1 A2 20NOV20:15:52:23 2.367 2.398 1 A3 18NOV20:10:25:11 2.367 2.367 1 A3 18NOV20:10:30:13 2.461 2.461 1 A3 18NOV20:11:45:16 2.461 2.461 1 A3 18NOV20:11:55:12 2.429 2.461 1 A3 18NOV20:12:45:21 2.616 2.616 1 A3 18NOV20:12:50:23 2.398 2.616 1 A3 18NOV20:13:55:28 2.429 2.429 1
@linlin87
Below returns the hourly max value same as in your sample data (max_prev_hour_derived) except for one row where I believe you've got it wrong.
data have;
input participant_id $ datetime:datetime. measure max_prev_hour area_under_curve;
format datetime datetime20.;
informat datetime anydtdtm20.;
datalines;
A1 30OCT20:10:16:24 1.028 1.028 1
A1 30OCT20:13:14:14 1.183 1.183 1
A1 30OCT20:13:24:12 1.277 1.277 1
A1 30OCT20:14:29:14 1.464 1.464 1
A1 30OCT20:14:44:13 1.619 1.619 1
A1 30OCT20:14:49:02 1.744 1.744 1
A1 30OCT20:14:56:15 1.806 1.806 1
A2 20NOV20:14:22:18 2.492 2.492 1
A2 20NOV20:14:37:17 2.492 2.492 1
A2 20NOV20:14:52:19 2.242 2.492 1
A2 20NOV20:14:59:20 2.305 2.305 1
A2 20NOV20:15:22:22 2.274 2.492 1
A2 20NOV20:15:37:21 2.398 2.398 1
A2 20NOV20:15:52:23 2.367 2.398 1
A3 18NOV20:10:25:11 2.367 2.367 1
A3 18NOV20:10:30:13 2.461 2.461 1
A3 18NOV20:11:45:16 2.461 2.461 1
A3 18NOV20:11:55:12 2.429 2.461 1
A3 18NOV20:12:45:21 2.616 2.616 1
A3 18NOV20:12:50:23 2.398 2.616 1
A3 18NOV20:13:55:28 2.429 2.429 1
;
run;
data want(drop=_:);
array a_m {2,10} 8 _temporary_;
set have;
by participant_id datetime;
if first.participant_id then call missing(of a_m[*]);
_i_curr=mod(_n_-1,10)+1;
a_m[1,_i_curr]=datetime;
a_m[2,_i_curr]=measure;
max_prev_hour_derived=a_m[2,_i_curr];
do _ind=dim2(a_m)-1 to 1 by -1;
_i_start=mod(_n_-1+_ind,10)+1;
if 0<=datetime-a_m[1,_i_start]<3600 then
do;
max_prev_hour_derived=max(max_prev_hour_derived,a_m[2,_i_start]);
end;
else leave;
end;
comp_flg=(max_prev_hour_derived=max_prev_hour);
run;
proc print data=want;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.