BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Palang
Obsidian | Level 7

Hello,

 

I have a big time-series dataset with 5 minute resolution for 4 years. In this dataset,a perticular variable (e.g., RAD) is measued for each 5 minutes at 9 different stations (e.g., ST1,..., ST9). It means for each 5 minute interval (for example, Feb. 1, 2015 1:00:00 PM) there are 9 samples for this variable. 

Now, I am wondoering how I can create a new dataset including the data of Feb 1 to Feb 10, 2015 from 13-19 (time) with a new variable= the average of variable "RAD" from all stations. In other words, for each 5 min interval I need the avergae of variable "RAD" instead of 9 RAD variables. Your help is really appreciated.

 

Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

It would help if in the future you could post your sample data as a SAS data step (so we don't have to do this work).

 

Below code doesn't include all these additional Date, Time, Month,... variables as this information is all redundant to Interval.

data have;
  infile datalines truncover dlm=',' dsd;
  attrib 
    stid informat=$8.
    Interval informat=datetime. format=datetime21.
    ;
  input STID Interval RELH TAIR WSPD SRAD;
datalines;
A,01JAN2015:00:00:00,58,-5.2,1.7,0,
B,01JAN2015:00:00:00,58,-5.2,1.6,0,
C,01JAN2015:00:00:00,58,-5.2,1.6,1
D,01JAN2015:00:00:00,58,-5.2,1.6,0
E,01JAN2015:00:00:00,58,-5.2,1.6,0
F,01JAN2015:00:00:00,58,-5.2,1.6
G,01JAN2015:00:00:00,58,-5.2,1.6
H,01JAN2015:00:00:00,58,-5.2,1.6
I,01JAN2015:00:00:00,58,-5.2,1.6
A,01JAN2015:00:05:00,60,-5.2
B,01JAN2015:00:05:00,58,-5.2
C,01JAN2015:00:05:00,58
D,01JAN2015:00:05:00,60
E,01JAN2015:00:05:00,40
F,01JAN2015:00:05:00,58
G,01JAN2015:00:05:00,66
H,01JAN2015:00:05:00,30
I,01JAN2015:00:05:00
;
run;


/* assuming data set is pre-sorted by interval */
/* missings excluded from calculating average  */
proc means data=have noprint /* missing */;
  by interval;
  where Interval between '01Feb2015 13:00:00'dt and '10Feb2015 19:59:59'dt
        and hour(interval) between 13 and 19;
  var relh tair wspd srad;
  output 
    out=want(drop=_:) 
    mean=relh_avg tair_avg wspd_avg srad_avg
  ;
run;
 

 

View solution in original post

5 REPLIES 5
Reeza
Super User

A SQL query is probably the easiest in terms of code, but this assumes that the average is the average of the selection, not of all observations in the original data set. If its' the average of the whole dataset, not just the subset, you need to do a subquery. 

 

Something like the following, untested and mocked up because you didn't post any sample data.

 

proc sql;
create table want as
select *, mean(rad) as average_rad
from have
where date between '01Feb2015'd and '10Feb2015'd
and time between 13 and 19;
quit;
Palang
Obsidian | Level 7

Thnak you for your response. Actualy, I have such a dataset:

 

ObsSTIDIntervalRELHTAIRWSPDSRADDateTimeMonthDayYearHourMinute
1A01JAN2015:00:00:0058-5.21.701/1/20150:0011201500
2B01JAN2015:00:00:0058-5.21.601/1/20150:0511201505
3C01JAN2015:00:00:0058-5.21.61       
4D01JAN2015:00:00:0058-5.21.60       
5E01JAN2015:00:00:0058-5.21.60       
6F01JAN2015:00:00:0058-5.21.6        
7G01JAN2015:00:00:0058-5.21.6        
8H01JAN2015:00:00:0058-5.21.6        
9I01JAN2015:00:00:0058-5.21.6        
10A01JAN2015:00:05:0060-5.2         
11B01JAN2015:00:05:0058-5.2         
12C01JAN2015:00:05:0058          
13D01JAN2015:00:05:0060          
14E01JAN2015:00:05:0040          
15F01JAN2015:00:05:0058          
16G01JAN2015:00:05:0066          
17H01JAN2015:00:05:0030          
18I01JAN2015:00:05:00           

 

For each date and time (i.e., interval), we have 9 series of samples at 9 different stations (STID).  I need to create a new dataset including single and unique Intervals (date and time) with the avergae of each variable for the 9 "STID"s (STID should be removed). Exactly like this:

 

ObsIntervalRELH_aveTAIR_aveWSPD_aveSRAD_aveDateTimeMonthDayYearHourMinute
101JAN2015:00:00:0058-5.21.701/1/20150:0011201500
201JAN2015:00:05:0060 6.7 2.0 0 1/1/2015 00:0511201505
301JAN2015:00:10:00 384.9 3.7 1.68 1/1/2015 00:10 112015010

 

For each date and time (i.e., interval), I need to find the average of each variable for each time and date interval (i.e., Interval) to removre the impact of stations (STID).

It'd be really appreciated if you can help.

 

Best Regards,

Paalang 

 

 

Patrick
Opal | Level 21

It would help if in the future you could post your sample data as a SAS data step (so we don't have to do this work).

 

Below code doesn't include all these additional Date, Time, Month,... variables as this information is all redundant to Interval.

data have;
  infile datalines truncover dlm=',' dsd;
  attrib 
    stid informat=$8.
    Interval informat=datetime. format=datetime21.
    ;
  input STID Interval RELH TAIR WSPD SRAD;
datalines;
A,01JAN2015:00:00:00,58,-5.2,1.7,0,
B,01JAN2015:00:00:00,58,-5.2,1.6,0,
C,01JAN2015:00:00:00,58,-5.2,1.6,1
D,01JAN2015:00:00:00,58,-5.2,1.6,0
E,01JAN2015:00:00:00,58,-5.2,1.6,0
F,01JAN2015:00:00:00,58,-5.2,1.6
G,01JAN2015:00:00:00,58,-5.2,1.6
H,01JAN2015:00:00:00,58,-5.2,1.6
I,01JAN2015:00:00:00,58,-5.2,1.6
A,01JAN2015:00:05:00,60,-5.2
B,01JAN2015:00:05:00,58,-5.2
C,01JAN2015:00:05:00,58
D,01JAN2015:00:05:00,60
E,01JAN2015:00:05:00,40
F,01JAN2015:00:05:00,58
G,01JAN2015:00:05:00,66
H,01JAN2015:00:05:00,30
I,01JAN2015:00:05:00
;
run;


/* assuming data set is pre-sorted by interval */
/* missings excluded from calculating average  */
proc means data=have noprint /* missing */;
  by interval;
  where Interval between '01Feb2015 13:00:00'dt and '10Feb2015 19:59:59'dt
        and hour(interval) between 13 and 19;
  var relh tair wspd srad;
  output 
    out=want(drop=_:) 
    mean=relh_avg tair_avg wspd_avg srad_avg
  ;
run;
 

 

Palang
Obsidian | Level 7

Thank you very much Patrick. The problem is solved.

Sure, I'll post a sample dataset for the next time.

 

Regards

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1330 views
  • 4 likes
  • 4 in conversation