Solved: Getting the average for multiple samples

Palang · Posted 11-19-2016 01:28 PM

Hello,

I have a big time-series dataset with 5 minute resolution for 4 years. In this dataset,a perticular variable (e.g., RAD) is measued for each 5 minutes at 9 different stations (e.g., ST1,..., ST9). It means for each 5 minute interval (for example, Feb. 1, 2015 1:00:00 PM) there are 9 samples for this variable.

Now, I am wondoering how I can create a new dataset including the data of Feb 1 to Feb 10, 2015 from 13-19 (time) with a new variable= the average of variable "RAD" from all stations. In other words, for each 5 min interval I need the avergae of variable "RAD" instead of 9 RAD variables. Your help is really appreciated.

Thank you.

Patrick · Posted 11-20-2016 07:08 PM

It would help if in the future you could post your sample data as a SAS data step (so we don't have to do this work).

Below code doesn't include all these additional Date, Time, Month,... variables as this information is all redundant to Interval.

data have;
  infile datalines truncover dlm=',' dsd;
  attrib 
    stid informat=$8.
    Interval informat=datetime. format=datetime21.
    ;
  input STID Interval RELH TAIR WSPD SRAD;
datalines;
A,01JAN2015:00:00:00,58,-5.2,1.7,0,
B,01JAN2015:00:00:00,58,-5.2,1.6,0,
C,01JAN2015:00:00:00,58,-5.2,1.6,1
D,01JAN2015:00:00:00,58,-5.2,1.6,0
E,01JAN2015:00:00:00,58,-5.2,1.6,0
F,01JAN2015:00:00:00,58,-5.2,1.6
G,01JAN2015:00:00:00,58,-5.2,1.6
H,01JAN2015:00:00:00,58,-5.2,1.6
I,01JAN2015:00:00:00,58,-5.2,1.6
A,01JAN2015:00:05:00,60,-5.2
B,01JAN2015:00:05:00,58,-5.2
C,01JAN2015:00:05:00,58
D,01JAN2015:00:05:00,60
E,01JAN2015:00:05:00,40
F,01JAN2015:00:05:00,58
G,01JAN2015:00:05:00,66
H,01JAN2015:00:05:00,30
I,01JAN2015:00:05:00
;
run;


/* assuming data set is pre-sorted by interval */
/* missings excluded from calculating average  */
proc means data=have noprint /* missing */;
  by interval;
  where Interval between '01Feb2015 13:00:00'dt and '10Feb2015 19:59:59'dt
        and hour(interval) between 13 and 19;
  var relh tair wspd srad;
  output 
    out=want(drop=_:) 
    mean=relh_avg tair_avg wspd_avg srad_avg
  ;
run;

View solution in original post

Reeza · Posted 11-20-2016 12:05 AM

A SQL query is probably the easiest in terms of code, but this assumes that the average is the average of the selection, not of all observations in the original data set. If its' the average of the whole dataset, not just the subset, you need to do a subquery.

Something like the following, untested and mocked up because you didn't post any sample data.

proc sql;
create table want as
select *, mean(rad) as average_rad
from have
where date between '01Feb2015'd and '10Feb2015'd
and time between 13 and 19;
quit;

Palang · Posted 11-20-2016 11:35 AM

Thnak you for your response. Actualy, I have such a dataset:

Obs	STID	Interval	RELH	TAIR	WSPD	SRAD	Date	Time	Month	Day	Year	Hour	Minute
1	A	01JAN2015:00:00:00	58	-5.2	1.7	0	1/1/2015	0:00	1	1	2015	0	0
2	B	01JAN2015:00:00:00	58	-5.2	1.6	0	1/1/2015	0:05	1	1	2015	0	5
3	C	01JAN2015:00:00:00	58	-5.2	1.6	1
4	D	01JAN2015:00:00:00	58	-5.2	1.6	0
5	E	01JAN2015:00:00:00	58	-5.2	1.6	0
6	F	01JAN2015:00:00:00	58	-5.2	1.6
7	G	01JAN2015:00:00:00	58	-5.2	1.6
8	H	01JAN2015:00:00:00	58	-5.2	1.6
9	I	01JAN2015:00:00:00	58	-5.2	1.6
10	A	01JAN2015:00:05:00	60	-5.2
11	B	01JAN2015:00:05:00	58	-5.2
12	C	01JAN2015:00:05:00	58
13	D	01JAN2015:00:05:00	60
14	E	01JAN2015:00:05:00	40
15	F	01JAN2015:00:05:00	58
16	G	01JAN2015:00:05:00	66
17	H	01JAN2015:00:05:00	30
18	I	01JAN2015:00:05:00

For each date and time (i.e., interval), we have 9 series of samples at 9 different stations (STID). I need to create a new dataset including single and unique Intervals (date and time) with the avergae of each variable for the 9 "STID"s (STID should be removed). Exactly like this:

Obs	Interval	RELH_ave	TAIR_ave	WSPD_ave	SRAD_ave	Date	Time	Month	Day	Year	Hour	Minute
1	01JAN2015:00:00:00	58	-5.2	1.7	0	1/1/2015	0:00	1	1	2015	0	0
2	01JAN2015:00:05:00	60	6.7	2.0	0	1/1/2015	00:05	1	1	2015	0	5
3	01JAN2015:00:10:00	38	4.9	3.7	1.68	1/1/2015	00:10	1	1	2015	0	10

For each date and time (i.e., interval), I need to find the average of each variable for each time and date interval (i.e., Interval) to removre the impact of stations (STID).

It'd be really appreciated if you can help.

Best Regards,

Paalang

Patrick · Posted 11-20-2016 07:08 PM

It would help if in the future you could post your sample data as a SAS data step (so we don't have to do this work).

Below code doesn't include all these additional Date, Time, Month,... variables as this information is all redundant to Interval.

data have;
  infile datalines truncover dlm=',' dsd;
  attrib 
    stid informat=$8.
    Interval informat=datetime. format=datetime21.
    ;
  input STID Interval RELH TAIR WSPD SRAD;
datalines;
A,01JAN2015:00:00:00,58,-5.2,1.7,0,
B,01JAN2015:00:00:00,58,-5.2,1.6,0,
C,01JAN2015:00:00:00,58,-5.2,1.6,1
D,01JAN2015:00:00:00,58,-5.2,1.6,0
E,01JAN2015:00:00:00,58,-5.2,1.6,0
F,01JAN2015:00:00:00,58,-5.2,1.6
G,01JAN2015:00:00:00,58,-5.2,1.6
H,01JAN2015:00:00:00,58,-5.2,1.6
I,01JAN2015:00:00:00,58,-5.2,1.6
A,01JAN2015:00:05:00,60,-5.2
B,01JAN2015:00:05:00,58,-5.2
C,01JAN2015:00:05:00,58
D,01JAN2015:00:05:00,60
E,01JAN2015:00:05:00,40
F,01JAN2015:00:05:00,58
G,01JAN2015:00:05:00,66
H,01JAN2015:00:05:00,30
I,01JAN2015:00:05:00
;
run;


/* assuming data set is pre-sorted by interval */
/* missings excluded from calculating average  */
proc means data=have noprint /* missing */;
  by interval;
  where Interval between '01Feb2015 13:00:00'dt and '10Feb2015 19:59:59'dt
        and hour(interval) between 13 and 19;
  var relh tair wspd srad;
  output 
    out=want(drop=_:) 
    mean=relh_avg tair_avg wspd_avg srad_avg
  ;
run;

Palang · Posted 11-21-2016 11:53 AM

Thank you very much Patrick. The problem is solved.

Sure, I'll post a sample dataset for the next time.

Regards

ballardw · Posted 11-21-2016 05:28 PM

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will help with converting a SAS dataset to data step code.

Getting the average for multiple samples

Re: Getting the average for multiple samples

Re: Getting the average for multiple samples

Re: Getting the average for multiple samples

Re: Getting the average for multiple samples

Re: Getting the average for multiple samples

Re: Getting the average for multiple samples

Classroom Training Available!