Hello all,
I have pulled data from various sorces and merged it into the format below. I want to calculate total precipitation (var=precip) over different time intervals (e.g 2 weeks, 6 weeks, 9 months, etc...) before the reference point displayed by the variables 'X1' and 'X2' (I have more than just these two variables). These reference points are sites that were visited repeatedly for other measurements; the reference points occur several times for each variable several times a year over several years. My original thought was to create a new variable for each metric (2wkprecp, 6wkprecp) and each reference point and use proc means to take sums over those assinged intervals but I think I am heading down the wrong path.
Obs date X1 X2 precip date1
1 02AUG1994 8 19940802
2 03AUG1994 56 19940803
3 04AUG1994 18 19940804
4 05AUG1994 15 19940805
5 06AUG1994 36 19940806
6 07AUG1994 0 19940807
7 08AUG1994 0 19940808
8 09AUG1994 0 19940809
9 10AUG1994 0 19940810
10 11AUG1994 76 19940811
11 12AUG1994 0 19940812
12 13AUG1994 33 19940813
13 14AUG1994 79 19940814
14 15AUG1994 0 19940815
15 16AUG1994 13 19940816
16 17AUG1994 5 19940817
17 18AUG1994 0 19940818
18 19AUG1994 X2 61 19940819
19 20AUG1994 0 19940820
20 21AUG1994 33 19940821
21 22AUG1994 231 19940822
22 23AUG1994 X1 0 19940823
23 24AUG1994 0 19940824
24 25AUG1994 5 19940825
In the end I would like to have a dataset in the form (~ 100 records):
date | refpoint | 2wkprecp | 6wkprecp | 9moprecp |
19-Aug-94 | X2 | 336 | 789 | 2120 |
23-Aug-94 | X1 | 531 | 1053 | 2201 |
17-Sep-97 | X1 | 344 | 718 | 1938 |
18-Sep-97 | X2 | 344 | 718 | 1905 |
It should be noted that sometimes X1 and X2 (X3, X4, etc...) are on the same dates.
Any suggestions are welcome.
Thanks in advance.
Try this:
data have;
format d date11.;
input d date9. x1 $ 10-15 x2 $ 16-21 precip 22-30;
datalines;
02AUG1994 8
03AUG1994 56
04AUG1994 18
05AUG1994 15
06AUG1994 36
07AUG1994 0
08AUG1994 0
09AUG1994 0
10AUG1994 0
11AUG1994 76
12AUG1994 0
13AUG1994 33
14AUG1994 79
15AUG1994 0
16AUG1994 13
17AUG1994 5
18AUG1994 0
19AUG1994 X2 61
20AUG1994 0
21AUG1994 33
22AUG1994 231
23AUG1994 X1 0
24AUG1994 0
25AUG1994 5
;
proc transpose data=have
out=refs(rename=(d=refDate col1=station) where=(notspace(station)>0) drop=_:);
by d notsorted;
var x:;
run;
data intervals;
input interval $;
datalines;
WEEK2
WEEK6
MONTH9
;
proc sql;
create table want as
select station, refDate, interval, sum(precip) as totalPrecip
from have, refs, intervals
where d between intnx(interval, refDate-1, -1, "SAME") and refDate
group by station, refDate, interval;
peat,
There are people out there whose SQL is much better than mine, but here are questions that anyone who replies will need to know.
What does DATE actually contain? Is it a SAS date, but being printed in a DATE9 format, or is it a character string?
What does DATE1 actually contain? Is it a numeric being printed without a format, or is it a SAS date being printed in the YYMMDDn8 format?
Good luck.
Thanks for asking for the clarification.
DATE is a SAS date in DATE9 format. The DATE1 is just a number. Originally I was thinking that I could reference a range based on DATE1. I am willing to format these variables to any format necessary.
Thanks
OK, here's an attempt. Like I said, I expect someone who knows SQL much better will fix this and make it look more professional.
I'm going to assume you have already replaced DATE1 with the equivalent SAS date, something along the lines of:
date1 = input( put(date1, 8.), yymmdd8.);
This could then be a piece of the solution, creating _2wkprecip for the X1 records. Please adjust the date ranges using an equal sign (<= or >=), as appropriate. (You may also need to do a little debugging ... I tested pieces of the logic but not the whole thing.)
proc sql;
create table x1 as select distinct x1, date from original
where x1='X1';
create table to_be_summed as select a.x1, a.date, b.precip
from x1 as a, original as b
where (b.date1 < a.date) and (b.date1 + 14 > a.date);
create table x1_2weeks as select x1, date, sum(precip) as _2wkprecip from to_be_summed
group by date;
quit;
Until someone "fixes" the code, you would need a similar group of statements for each time period, and for X2, X3, etc.
There remains the final task of putting the pieces together. But this gives you a starting point.
Good luck.
Try this:
data have;
format d date11.;
input d date9. x1 $ 10-15 x2 $ 16-21 precip 22-30;
datalines;
02AUG1994 8
03AUG1994 56
04AUG1994 18
05AUG1994 15
06AUG1994 36
07AUG1994 0
08AUG1994 0
09AUG1994 0
10AUG1994 0
11AUG1994 76
12AUG1994 0
13AUG1994 33
14AUG1994 79
15AUG1994 0
16AUG1994 13
17AUG1994 5
18AUG1994 0
19AUG1994 X2 61
20AUG1994 0
21AUG1994 33
22AUG1994 231
23AUG1994 X1 0
24AUG1994 0
25AUG1994 5
;
proc transpose data=have
out=refs(rename=(d=refDate col1=station) where=(notspace(station)>0) drop=_:);
by d notsorted;
var x:;
run;
data intervals;
input interval $;
datalines;
WEEK2
WEEK6
MONTH9
;
proc sql;
create table want as
select station, refDate, interval, sum(precip) as totalPrecip
from have, refs, intervals
where d between intnx(interval, refDate-1, -1, "SAME") and refDate
group by station, refDate, interval;
After which you may want to add :
proc transpose data=want out=wantTable(drop=_:);
var totalPrecip;
by station refDate;
id interval;
run;
to format the table.
Good luck.
PGStats: I'm responding, primarily, to say "Nice Solution!"
peatjohnston The only thing I'll add to PGStats' suggested code is to insure that you add a "quit;" statement at the end of the proc sql code.
This was most helpful and it looks like I need to start learning the SQL Procedure. I can see that you put three 'tables' together and it appears that proc sql understands the temporal values (WEEK6, etc...). I will read up but there are a few things I would like to follow up on.
In the data step:
input d date9. x1 $ 10-15 x2 $ 16-21 precip 22-30;
- I by pased this portion of the code since my date was already formated as a SAS date. But my question is why did you specify the spacings for each variable, is that style or does proc sql want strict spacing at times?
In proc sql:
where d between intnx(interval, refDate-1, -1, "SAME") and refDate
- This ended up including an extra day in the summation and included the refDate itself in the summation (which was not something I discussed). I modified the code to:
where d between intnx(interval, refDate, -1, "SAME") and refDate-1
This gave me the exact number of days up to but not including the sample date(WEEK2 = 14 days before the reDate not including the reDate). With the sample date aside, was there a reason for using refDate-1 in the expression? as it appears that this increases the time interval by 1.
Also, adding the ':' at the end of a variable or unwanted character was not a trick I learned. Is there a reference somewhere where I can find more of the basic programming syntax? I am pretty much self taught and things like this make it a lot easier for me. Thanks again, I have learned a lot from your answer.
Learning SQL is costly but a good investment as it can be used with most DBMSs. SAS implementation of SQL is very powerful because it allows almost all of SAS functions and can interact with macro variables. I also find it very efficient. A book on relational databases is probably a better place to start learning SQL than SAS Doc.
The details of the first data step are not important and probably not very optimal as I almost never input text data anymore.
I had shifted the time window for the summation in order to match the 336 value from your example. I wasn't sure what you wanted exactly. In SAS, date values are in units of days so that refDate+1 is the day after refDate. But it is best to stay independent of the internal date representation. So, ideally, your where condition should read: where d between intnx(interval, refDate, -1, "SAME") and intnx("DAY", refDate, -1, "SAME").
Time intervals such as WEEK2 are understood by the intnx function. I have to go back to the documentation every time I use it.
I learned the colon trick the same way you did just a few days ago, on this forum. But it is documented, at least in version 9.3, along with many other variable list notations, it neans all the variables that start with.
PG
How about:
data have; format d date11.; input d date9. x1 $ 10-15 x2 $ 16-21 precip 22-30; datalines; 02AUG1994 8 03AUG1994 56 04AUG1994 18 05AUG1994 15 06AUG1994 36 07AUG1994 0 08AUG1994 0 09AUG1994 0 10AUG1994 0 11AUG1994 76 12AUG1994 0 13AUG1994 33 14AUG1994 79 15AUG1994 0 16AUG1994 13 17AUG1994 5 18AUG1994 0 19AUG1994 X2 61 20AUG1994 0 21AUG1994 33 22AUG1994 231 23AUG1994 X1 0 24AUG1994 0 25AUG1994 5 ; run; proc sql feedback; create table want as select d,x1,x2,(select sum(precip) from have where d between intnx('week2',a.d,-1,'s') and a.d ) as _2wkprecp, (select sum(precip) from have where d between intnx('week6',a.d,-1,'s') and a.d ) as _6wkprecp, (select sum(precip) from have where d between intnx('month9',a.d,-1,'s') and a.d ) as _9moprecp from have as a where x1 is not missing or x2 is not missing; quit;
Ksharp
Hello Ksharp,
I did not run this, but I will look at it later and study it as an alternative as I start learning more about proc sql. My question/comment is with the WHERE statement in the second to last line:
where x1 is not missing or x2 is not missing;
If there is an X1 and X2 ocurring on the same date, would it only select one variable, but not both?
Thanks for the insight.
Peat
Hi. They will select both x1 and x2, because I have wroten x1,x2 in select clause.
Ksharp
Comparison of methods:
The method by Ksharp appears to be more direct as the run time is much faster (0.38 sec compated to the method from PGStats = 4.25 sec) and also transposes easily into the form I am looking for. One thing that I did not intend to discover from this post is that Kshap's method also works if the X1, X2... VALUES are numbers instead of text place holders as I presented in this post. As it turns out I have a separate set of data that I want to perform a similar operation on and then make calculations based on observed numbers to these metrics calculated by proc sql and in this way the data is already without transposing. I just have to add another 'or X is not missing' to the statement for each of the variables (12 in all) which is just fine with me.
Thank you both, this has turned out to be highly educational for me.
Peat
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.