- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I have a programming question regarding the calculation of a rolling sum for a range of variables. To simplify the question I have only included two variables to be calculated. The sum must be calculated for each customer represented in the data.
DATA = work.ydelser
PERSON ID DATE TOTAL PROD_A ...
A 08-01-2014 10 ...
A 13-01-2014 12 ...
A 23-01-2014 11
A 12-02-2014 17
B 07-01-2014 26
B 17-01-2014 50
B 25-02-2014 10
B 03-03-2014 10
The result I would like to have is:
DATA = work.ydel_sum
PERSON ID DATE TOTALsum PROD_A
A 08-01-2014 10 ...
A 13-01-2014 22
A 23-01-2014 33
A 12-02-2014 22
B 07-01-2014 26
B 17-01-2014 76
B 25-02-2014 10
B 03-03-2014 20
The computed dataset does not have to include the original variable. The date is in "date format".
I will use the code in a code-program in Enterprise Guide, but if anyone knows how to solve the question by Enterprise functions, please let me know.
I have seen a privious topic on this matter, but I cannot figure out how to use the "cards" function.
Thanks in advance!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is code which will do a rolling 30 day sum for you. The datalines or cards is just used to create the test data (I have left in to show you). I do not understand why the fourth A row in your example out data is 22 though, that doesn't make sense with the logic.
data have; informat date ddmmyy10.; format date date9.; input person_id $ date total; datalines; A 08-01-2014 10 A 13-01-2014 12 A 23-01-2014 11 A 12-02-2014 17 ; run; proc sql; create table WANT as select PERSON_ID, DATE, TOTAL, sum(TMP) as ROLLING_SUM from ( select A.*, B.TOTAL as TMP from HAVE A left join HAVE B on A.PERSON_ID=B.PERSON_ID and A.DATE-30 <= B.DATE <= A.DATE ) group by PERSON_ID, DATE, TOTAL; quit;
What this does is join all records where date is within 30 days to the original data, this is the subquery. That data is then grouped by the original variables and a sum calculated .
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hope this helps :) data ydelser; input PERSONID$ TOTAL; datalines; A 10 A 12 A 14 A 17 B 30 B 30 B 40 B 50 ; Data newdata; set ydelser; by PERSONID notsorted; if first.PERSONID then TOTALsum = 0; TOTALsum+TOTAL; run; Proc Print data = newdata; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks!
I've forgot to mention that I have hundreds of rows. Do I have so specify all in the datalines?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You use Proc Import to import the file ..so you dont have to write anything datelines .
Here is the link ..for Proc Import
https://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a000332605.htm
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is code which will do a rolling 30 day sum for you. The datalines or cards is just used to create the test data (I have left in to show you). I do not understand why the fourth A row in your example out data is 22 though, that doesn't make sense with the logic.
data have; informat date ddmmyy10.; format date date9.; input person_id $ date total; datalines; A 08-01-2014 10 A 13-01-2014 12 A 23-01-2014 11 A 12-02-2014 17 ; run; proc sql; create table WANT as select PERSON_ID, DATE, TOTAL, sum(TMP) as ROLLING_SUM from ( select A.*, B.TOTAL as TMP from HAVE A left join HAVE B on A.PERSON_ID=B.PERSON_ID and A.DATE-30 <= B.DATE <= A.DATE ) group by PERSON_ID, DATE, TOTAL; quit;
What this does is join all records where date is within 30 days to the original data, this is the subquery. That data is then grouped by the original variables and a sum calculated .
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You are right, the fourth row should have had the value 40. Sorry for the mistake.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This solution is pretty much the same as @pearsoninst provided; however, it includes the DATE variable and will keep only the last 30 days from the current date. If you want the last 30 days of a different date than the current, just substitute whatever date you want in place of the today().
data ydelser;
input PERSON_ID$ DATE ddmmyy10. TOTAL;
datalines;
A 08-01-2014 10
A 13-01-2014 12
A 23-01-2014 14
A 12-02-2014 17
B 07-01-2014 30
B 17-01-2014 30
B 25-02-2014 40
B 03-03-2014 50
;
run;
proc sort data=ydelser;
by PERSON_ID;
run;
data ydel_sum;
set ydelser;
by PERSON_ID;
Where DATE>=intnx("DAY",today(),-30);
If First.PERSON_ID then TOTALsum=0;
TOTALsum+TOTAL;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for all your replies! Both solutions worked, and I have accepted the one that I used as the solution.