Hello,
I have a big time-series dataset with 5 minute resolution for 4 years. In this dataset,a perticular variable (e.g., RAD) is measued for each 5 minutes at 9 different stations (e.g., ST1,..., ST9). It means for each 5 minute interval (for example, Feb. 1, 2015 1:00:00 PM) there are 9 samples for this variable.
Now, I am wondoering how I can create a new dataset including the data of Feb 1 to Feb 10, 2015 from 13-19 (time) with a new variable= the average of variable "RAD" from all stations. In other words, for each 5 min interval I need the avergae of variable "RAD" instead of 9 RAD variables. Your help is really appreciated.
Thank you.
It would help if in the future you could post your sample data as a SAS data step (so we don't have to do this work).
Below code doesn't include all these additional Date, Time, Month,... variables as this information is all redundant to Interval.
data have;
infile datalines truncover dlm=',' dsd;
attrib
stid informat=$8.
Interval informat=datetime. format=datetime21.
;
input STID Interval RELH TAIR WSPD SRAD;
datalines;
A,01JAN2015:00:00:00,58,-5.2,1.7,0,
B,01JAN2015:00:00:00,58,-5.2,1.6,0,
C,01JAN2015:00:00:00,58,-5.2,1.6,1
D,01JAN2015:00:00:00,58,-5.2,1.6,0
E,01JAN2015:00:00:00,58,-5.2,1.6,0
F,01JAN2015:00:00:00,58,-5.2,1.6
G,01JAN2015:00:00:00,58,-5.2,1.6
H,01JAN2015:00:00:00,58,-5.2,1.6
I,01JAN2015:00:00:00,58,-5.2,1.6
A,01JAN2015:00:05:00,60,-5.2
B,01JAN2015:00:05:00,58,-5.2
C,01JAN2015:00:05:00,58
D,01JAN2015:00:05:00,60
E,01JAN2015:00:05:00,40
F,01JAN2015:00:05:00,58
G,01JAN2015:00:05:00,66
H,01JAN2015:00:05:00,30
I,01JAN2015:00:05:00
;
run;
/* assuming data set is pre-sorted by interval */
/* missings excluded from calculating average */
proc means data=have noprint /* missing */;
by interval;
where Interval between '01Feb2015 13:00:00'dt and '10Feb2015 19:59:59'dt
and hour(interval) between 13 and 19;
var relh tair wspd srad;
output
out=want(drop=_:)
mean=relh_avg tair_avg wspd_avg srad_avg
;
run;
A SQL query is probably the easiest in terms of code, but this assumes that the average is the average of the selection, not of all observations in the original data set. If its' the average of the whole dataset, not just the subset, you need to do a subquery.
Something like the following, untested and mocked up because you didn't post any sample data.
proc sql;
create table want as
select *, mean(rad) as average_rad
from have
where date between '01Feb2015'd and '10Feb2015'd
and time between 13 and 19;
quit;
Thnak you for your response. Actualy, I have such a dataset:
Obs | STID | Interval | RELH | TAIR | WSPD | SRAD | Date | Time | Month | Day | Year | Hour | Minute |
1 | A | 01JAN2015:00:00:00 | 58 | -5.2 | 1.7 | 0 | 1/1/2015 | 0:00 | 1 | 1 | 2015 | 0 | 0 |
2 | B | 01JAN2015:00:00:00 | 58 | -5.2 | 1.6 | 0 | 1/1/2015 | 0:05 | 1 | 1 | 2015 | 0 | 5 |
3 | C | 01JAN2015:00:00:00 | 58 | -5.2 | 1.6 | 1 | |||||||
4 | D | 01JAN2015:00:00:00 | 58 | -5.2 | 1.6 | 0 | |||||||
5 | E | 01JAN2015:00:00:00 | 58 | -5.2 | 1.6 | 0 | |||||||
6 | F | 01JAN2015:00:00:00 | 58 | -5.2 | 1.6 | ||||||||
7 | G | 01JAN2015:00:00:00 | 58 | -5.2 | 1.6 | ||||||||
8 | H | 01JAN2015:00:00:00 | 58 | -5.2 | 1.6 | ||||||||
9 | I | 01JAN2015:00:00:00 | 58 | -5.2 | 1.6 | ||||||||
10 | A | 01JAN2015:00:05:00 | 60 | -5.2 | |||||||||
11 | B | 01JAN2015:00:05:00 | 58 | -5.2 | |||||||||
12 | C | 01JAN2015:00:05:00 | 58 | ||||||||||
13 | D | 01JAN2015:00:05:00 | 60 | ||||||||||
14 | E | 01JAN2015:00:05:00 | 40 | ||||||||||
15 | F | 01JAN2015:00:05:00 | 58 | ||||||||||
16 | G | 01JAN2015:00:05:00 | 66 | ||||||||||
17 | H | 01JAN2015:00:05:00 | 30 | ||||||||||
18 | I | 01JAN2015:00:05:00 |
For each date and time (i.e., interval), we have 9 series of samples at 9 different stations (STID). I need to create a new dataset including single and unique Intervals (date and time) with the avergae of each variable for the 9 "STID"s (STID should be removed). Exactly like this:
Obs | Interval | RELH_ave | TAIR_ave | WSPD_ave | SRAD_ave | Date | Time | Month | Day | Year | Hour | Minute |
1 | 01JAN2015:00:00:00 | 58 | -5.2 | 1.7 | 0 | 1/1/2015 | 0:00 | 1 | 1 | 2015 | 0 | 0 |
2 | 01JAN2015:00:05:00 | 60 | 6.7 | 2.0 | 0 | 1/1/2015 | 00:05 | 1 | 1 | 2015 | 0 | 5 |
3 | 01JAN2015:00:10:00 | 38 | 4.9 | 3.7 | 1.68 | 1/1/2015 | 00:10 | 1 | 1 | 2015 | 0 | 10 |
For each date and time (i.e., interval), I need to find the average of each variable for each time and date interval (i.e., Interval) to removre the impact of stations (STID).
It'd be really appreciated if you can help.
Best Regards,
Paalang
It would help if in the future you could post your sample data as a SAS data step (so we don't have to do this work).
Below code doesn't include all these additional Date, Time, Month,... variables as this information is all redundant to Interval.
data have;
infile datalines truncover dlm=',' dsd;
attrib
stid informat=$8.
Interval informat=datetime. format=datetime21.
;
input STID Interval RELH TAIR WSPD SRAD;
datalines;
A,01JAN2015:00:00:00,58,-5.2,1.7,0,
B,01JAN2015:00:00:00,58,-5.2,1.6,0,
C,01JAN2015:00:00:00,58,-5.2,1.6,1
D,01JAN2015:00:00:00,58,-5.2,1.6,0
E,01JAN2015:00:00:00,58,-5.2,1.6,0
F,01JAN2015:00:00:00,58,-5.2,1.6
G,01JAN2015:00:00:00,58,-5.2,1.6
H,01JAN2015:00:00:00,58,-5.2,1.6
I,01JAN2015:00:00:00,58,-5.2,1.6
A,01JAN2015:00:05:00,60,-5.2
B,01JAN2015:00:05:00,58,-5.2
C,01JAN2015:00:05:00,58
D,01JAN2015:00:05:00,60
E,01JAN2015:00:05:00,40
F,01JAN2015:00:05:00,58
G,01JAN2015:00:05:00,66
H,01JAN2015:00:05:00,30
I,01JAN2015:00:05:00
;
run;
/* assuming data set is pre-sorted by interval */
/* missings excluded from calculating average */
proc means data=have noprint /* missing */;
by interval;
where Interval between '01Feb2015 13:00:00'dt and '10Feb2015 19:59:59'dt
and hour(interval) between 13 and 19;
var relh tair wspd srad;
output
out=want(drop=_:)
mean=relh_avg tair_avg wspd_avg srad_avg
;
run;
Thank you very much Patrick. The problem is solved.
Sure, I'll post a sample dataset for the next time.
Regards
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will help with converting a SAS dataset to data step code.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.