I think this is pretty do-able with PROC SQL because it is very good for producing summary statistics at different strata in the data set. However, it seems to me that you need an extra identifier in the first data set (which I've called endpoints) that can differentiate different periods of time within ID and day. I've made that extra variable and called it "Period" and it should be unique for every unique value of ID, day, and Start. Then you need to join the tables with a WHERE statement that categorizes the different times from the concentration data set in the periods defined in the endpoints data set. I've added my own fake data to a section of the "Table 2" to illustrate how this might work: data endpoints;
input ID $ day $ start end period;
cards;
21005 3 4004 4012 1
21005 3 4031 4055 2
;
run;
data concentration;
input ID $ day $ time concentration;
cards;
21005 3 4004 1981
21005 3 4005 3077
21005 3 4006 1371
21005 3 4007 1990
21005 3 4008 1869
21005 3 4009 934
21005 3 4010 413
21005 3 4011 2506
21005 3 4012 2948
21005 3 4031 222
21005 3 4032 444
21005 3 4033 666
;
run;
proc sql;
create table want as
select distinct a.ID, a.day, a.start, a.end, min(b.concentration) as min, max(b.concentration) as max, mean(b.concentration) as mean, median(b.concentration) as median, sum(b.concentration) as sum
from endpoints a, concentration b
where a.id = b.id and a.day = b.day and (a.start <= b.time <=a.end)
group by a.id, a.day, a.period;
quit;
proc print;
run;
... View more