BookmarkSubscribeRSS Feed
j262byuu
Calcite | Level 5

I have two tables like below

Table 1

ID    day start end
21005 3   4004  4012
21005 3   4031  4055
21005 4   4671  4679
21005 4   4689  4792
21005 4   4803  4828
21005 4   4837  4860
21005 4   5301  5424
21005 6   8068  8084
21005 6   8114  8148
21005 7   9239  9256
21005 7   9311  9326
21005 7   9497  9515
21005 7   9535  9574
21005 7   9615  9627
21005 7   9679  9694
21006 3   3738  3752
21006 5   7065  7077
21006 6   8374  8402
21007 1   732   742
21007 1   773   785
21007 1   831   853
21007 1   865   876
21007 1   880   894
...................

Table 2

ID    day time concentration
21005 1 1 562 ................... 21005 2 1 90 ................... 21005 3 1 2258 ................... 21005 3 4004 1981 21005 3 4005 3077 21005 3 4006 1371 21005 3 4007 1990 21005 3 4008 1869 21005 3 4009 934 21005 3 4010 413 21005 3 4011 2506 21005 3 4012 2948 ................... 21005 3 4031 2266 ...................

and I would like to create a summary table using table 1 and table 2 data

summary table

ID    day start end  min max  mean    median sum
21005 3 4004 4012 413 3077 1898.78 1981 17089 ................................................

basically summary table = calculate the min max mean median and sum of concentration using start and end time in table 1, by ID and day. I have no idea about how to do this ...

 

I tried to merge table 2 to table 1 by id and day, then calculate the descriptive stats, but there's multiple start and end for same day 

1 REPLY 1
svh
Lapis Lazuli | Level 10 svh
Lapis Lazuli | Level 10

I think this is pretty do-able with PROC SQL because it is very good for producing summary statistics at different strata in the data set. However, it seems to me that you need an extra identifier in the first data set (which I've called endpoints) that can differentiate different periods of time within ID and day. I've made that extra variable and called it "Period" and it should be unique for every unique value of ID, day, and Start. Then you need to join the tables with a WHERE statement that categorizes the different times from the concentration data set in the periods defined in the endpoints data set. 

 

I've added my own fake data to a section of the "Table 2" to illustrate how this might work:

 

data endpoints;
input ID $   day $ start end period;
cards;
21005 3   4004  4012 1
21005 3   4031  4055 2
;
run;
data concentration;
input ID $ day $ time concentration;
cards;
21005 3   4004 1981
21005 3   4005 3077
21005 3   4006 1371
21005 3   4007 1990
21005 3   4008 1869
21005 3   4009 934
21005 3   4010 413
21005 3   4011 2506
21005 3   4012 2948
21005 3   4031 222
21005 3   4032 444
21005 3   4033 666
;
run;

proc sql;
   create table want as
   select distinct a.ID, a.day, a.start, a.end, min(b.concentration) as min, max(b.concentration) as max, mean(b.concentration) as mean, median(b.concentration) as median, sum(b.concentration) as sum
   from endpoints a, concentration b
   where a.id = b.id and a.day = b.day and (a.start <= b.time <=a.end)
   group by a.id, a.day, a.period;
   quit; 
proc print;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 450 views
  • 0 likes
  • 2 in conversation