I have two tables like below
Table 1
ID day start end
21005 3 4004 4012
21005 3 4031 4055
21005 4 4671 4679
21005 4 4689 4792
21005 4 4803 4828
21005 4 4837 4860
21005 4 5301 5424
21005 6 8068 8084
21005 6 8114 8148
21005 7 9239 9256
21005 7 9311 9326
21005 7 9497 9515
21005 7 9535 9574
21005 7 9615 9627
21005 7 9679 9694
21006 3 3738 3752
21006 5 7065 7077
21006 6 8374 8402
21007 1 732 742
21007 1 773 785
21007 1 831 853
21007 1 865 876
21007 1 880 894
...................
Table 2
ID day time concentration
21005 1 1 562
...................
21005 2 1 90
...................
21005 3 1 2258
...................
21005 3 4004 1981
21005 3 4005 3077
21005 3 4006 1371
21005 3 4007 1990
21005 3 4008 1869
21005 3 4009 934
21005 3 4010 413
21005 3 4011 2506
21005 3 4012 2948
...................
21005 3 4031 2266
...................
and I would like to create a summary table using table 1 and table 2 data
summary table
ID day start end min max mean median sum
21005 3 4004 4012 413 3077 1898.78 1981 17089
................................................
basically summary table = calculate the min max mean median and sum of concentration using start and end time in table 1, by ID and day. I have no idea about how to do this ...
I tried to merge table 2 to table 1 by id and day, then calculate the descriptive stats, but there's multiple start and end for same day
I think this is pretty do-able with PROC SQL because it is very good for producing summary statistics at different strata in the data set. However, it seems to me that you need an extra identifier in the first data set (which I've called endpoints) that can differentiate different periods of time within ID and day. I've made that extra variable and called it "Period" and it should be unique for every unique value of ID, day, and Start. Then you need to join the tables with a WHERE statement that categorizes the different times from the concentration data set in the periods defined in the endpoints data set.
I've added my own fake data to a section of the "Table 2" to illustrate how this might work:
data endpoints;
input ID $ day $ start end period;
cards;
21005 3 4004 4012 1
21005 3 4031 4055 2
;
run;
data concentration;
input ID $ day $ time concentration;
cards;
21005 3 4004 1981
21005 3 4005 3077
21005 3 4006 1371
21005 3 4007 1990
21005 3 4008 1869
21005 3 4009 934
21005 3 4010 413
21005 3 4011 2506
21005 3 4012 2948
21005 3 4031 222
21005 3 4032 444
21005 3 4033 666
;
run;
proc sql;
create table want as
select distinct a.ID, a.day, a.start, a.end, min(b.concentration) as min, max(b.concentration) as max, mean(b.concentration) as mean, median(b.concentration) as median, sum(b.concentration) as sum
from endpoints a, concentration b
where a.id = b.id and a.day = b.day and (a.start <= b.time <=a.end)
group by a.id, a.day, a.period;
quit;
proc print;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.