Getting the average for multiple samples

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 13
Accepted Solution

Getting the average for multiple samples

Hello,

 

I have a big time-series dataset with 5 minute resolution for 4 years. In this dataset,a perticular variable (e.g., RAD) is measued for each 5 minutes at 9 different stations (e.g., ST1,..., ST9). It means for each 5 minute interval (for example, Feb. 1, 2015 1:00:00 PM) there are 9 samples for this variable. 

Now, I am wondoering how I can create a new dataset including the data of Feb 1 to Feb 10, 2015 from 13-19 (time) with a new variable= the average of variable "RAD" from all stations. In other words, for each 5 min interval I need the avergae of variable "RAD" instead of 9 RAD variables. Your help is really appreciated.

 

Thank you.


Accepted Solutions
Solution
‎11-21-2016 11:51 AM
Respected Advisor
Posts: 3,822

Re: Getting the average for multiple samples

[ Edited ]

It would help if in the future you could post your sample data as a SAS data step (so we don't have to do this work).

 

Below code doesn't include all these additional Date, Time, Month,... variables as this information is all redundant to Interval.

data have;
  infile datalines truncover dlm=',' dsd;
  attrib 
    stid informat=$8.
    Interval informat=datetime. format=datetime21.
    ;
  input STID Interval RELH TAIR WSPD SRAD;
datalines;
A,01JAN2015:00:00:00,58,-5.2,1.7,0,
B,01JAN2015:00:00:00,58,-5.2,1.6,0,
C,01JAN2015:00:00:00,58,-5.2,1.6,1
D,01JAN2015:00:00:00,58,-5.2,1.6,0
E,01JAN2015:00:00:00,58,-5.2,1.6,0
F,01JAN2015:00:00:00,58,-5.2,1.6
G,01JAN2015:00:00:00,58,-5.2,1.6
H,01JAN2015:00:00:00,58,-5.2,1.6
I,01JAN2015:00:00:00,58,-5.2,1.6
A,01JAN2015:00:05:00,60,-5.2
B,01JAN2015:00:05:00,58,-5.2
C,01JAN2015:00:05:00,58
D,01JAN2015:00:05:00,60
E,01JAN2015:00:05:00,40
F,01JAN2015:00:05:00,58
G,01JAN2015:00:05:00,66
H,01JAN2015:00:05:00,30
I,01JAN2015:00:05:00
;
run;


/* assuming data set is pre-sorted by interval */
/* missings excluded from calculating average  */
proc means data=have noprint /* missing */;
  by interval;
  where Interval between '01Feb2015 13:00:00'dt and '10Feb2015 19:59:59'dt
        and hour(interval) between 13 and 19;
  var relh tair wspd srad;
  output 
    out=want(drop=_:) 
    mean=relh_avg tair_avg wspd_avg srad_avg
  ;
run;
 

 

View solution in original post


All Replies
Grand Advisor
Posts: 17,297

Re: Getting the average for multiple samples

A SQL query is probably the easiest in terms of code, but this assumes that the average is the average of the selection, not of all observations in the original data set. If its' the average of the whole dataset, not just the subset, you need to do a subquery. 

 

Something like the following, untested and mocked up because you didn't post any sample data.

 

proc sql;
create table want as
select *, mean(rad) as average_rad
from have
where date between '01Feb2015'd and '10Feb2015'd
and time between 13 and 19;
quit;
Occasional Contributor
Posts: 13

Re: Getting the average for multiple samples

Thnak you for your response. Actualy, I have such a dataset:

 

ObsSTIDIntervalRELHTAIRWSPDSRADDateTimeMonthDayYearHourMinute
1A01JAN2015:00:00:0058-5.21.701/1/20150:0011201500
2B01JAN2015:00:00:0058-5.21.601/1/20150:0511201505
3C01JAN2015:00:00:0058-5.21.61       
4D01JAN2015:00:00:0058-5.21.60       
5E01JAN2015:00:00:0058-5.21.60       
6F01JAN2015:00:00:0058-5.21.6        
7G01JAN2015:00:00:0058-5.21.6        
8H01JAN2015:00:00:0058-5.21.6        
9I01JAN2015:00:00:0058-5.21.6        
10A01JAN2015:00:05:0060-5.2         
11B01JAN2015:00:05:0058-5.2         
12C01JAN2015:00:05:0058          
13D01JAN2015:00:05:0060          
14E01JAN2015:00:05:0040          
15F01JAN2015:00:05:0058          
16G01JAN2015:00:05:0066          
17H01JAN2015:00:05:0030          
18I01JAN2015:00:05:00           

 

For each date and time (i.e., interval), we have 9 series of samples at 9 different stations (STID).  I need to create a new dataset including single and unique Intervals (date and time) with the avergae of each variable for the 9 "STID"s (STID should be removed). Exactly like this:

 

ObsIntervalRELH_aveTAIR_aveWSPD_aveSRAD_aveDateTimeMonthDayYearHourMinute
101JAN2015:00:00:0058-5.21.701/1/20150:0011201500
201JAN2015:00:05:0060 6.7 2.0 0 1/1/2015 00:0511201505
301JAN2015:00:10:00 384.9 3.7 1.68 1/1/2015 00:10 112015010

 

For each date and time (i.e., interval), I need to find the average of each variable for each time and date interval (i.e., Interval) to removre the impact of stations (STID).

It'd be really appreciated if you can help.

 

Best Regards,

Paalang 

 

 

Solution
‎11-21-2016 11:51 AM
Respected Advisor
Posts: 3,822

Re: Getting the average for multiple samples

[ Edited ]

It would help if in the future you could post your sample data as a SAS data step (so we don't have to do this work).

 

Below code doesn't include all these additional Date, Time, Month,... variables as this information is all redundant to Interval.

data have;
  infile datalines truncover dlm=',' dsd;
  attrib 
    stid informat=$8.
    Interval informat=datetime. format=datetime21.
    ;
  input STID Interval RELH TAIR WSPD SRAD;
datalines;
A,01JAN2015:00:00:00,58,-5.2,1.7,0,
B,01JAN2015:00:00:00,58,-5.2,1.6,0,
C,01JAN2015:00:00:00,58,-5.2,1.6,1
D,01JAN2015:00:00:00,58,-5.2,1.6,0
E,01JAN2015:00:00:00,58,-5.2,1.6,0
F,01JAN2015:00:00:00,58,-5.2,1.6
G,01JAN2015:00:00:00,58,-5.2,1.6
H,01JAN2015:00:00:00,58,-5.2,1.6
I,01JAN2015:00:00:00,58,-5.2,1.6
A,01JAN2015:00:05:00,60,-5.2
B,01JAN2015:00:05:00,58,-5.2
C,01JAN2015:00:05:00,58
D,01JAN2015:00:05:00,60
E,01JAN2015:00:05:00,40
F,01JAN2015:00:05:00,58
G,01JAN2015:00:05:00,66
H,01JAN2015:00:05:00,30
I,01JAN2015:00:05:00
;
run;


/* assuming data set is pre-sorted by interval */
/* missings excluded from calculating average  */
proc means data=have noprint /* missing */;
  by interval;
  where Interval between '01Feb2015 13:00:00'dt and '10Feb2015 19:59:59'dt
        and hour(interval) between 13 and 19;
  var relh tair wspd srad;
  output 
    out=want(drop=_:) 
    mean=relh_avg tair_avg wspd_avg srad_avg
  ;
run;
 

 

Occasional Contributor
Posts: 13

Re: Getting the average for multiple samples

Thank you very much Patrick. The problem is solved.

Sure, I'll post a sample dataset for the next time.

 

Regards

Grand Advisor
Posts: 10,193

Re: Getting the average for multiple samples

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will help with converting a SAS dataset to data step code.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 272 views
  • 4 likes
  • 4 in conversation