Hello everyone, I would really appreciate your help.
The first three columns is the sample of data I have and the last column is what I want to get with the SAS code.
So, I want for each customer and each month to calculate sum of flags in the last 12 months (or less, if I don't have 12, like in the first observation ), including the month of calculation.
So, for each month in the group I calculate the sum of flags for that month and 11 months before.
CUSTUMER DATE flag Desired column
customer1 31AUG2015:00:00:00.000 1 1
customer1 30SEP2015:00:00:00.000 0 1
customer1 31OCT2015:00:00:00.000 0 1
customer1 30NOV2015:00:00:00.000 0 1
customer1 31DEC2015:00:00:00.000 0 1
customer1 31JAN2016:00:00:00.000 0 1
customer1 29FEB2016:00:00:00.000 0 1
customer1 31MAR2016:00:00:00.000 0 1
customer1 30APR2016:00:00:00.000 0 1
customer1 31MAY2016:00:00:00.000 0 1
customer1 30JUN2016:00:00:00.000 0 1
customer1 31JUL2016:00:00:00.000 0 1
customer1 31AUG2016:00:00:00.000 0 0
customer1 30SEP2016:00:00:00.000 0 0
customer1 31OCT2016:00:00:00.000 1 1
customer1 30NOV2016:00:00:00.000 0 1
customer1 31DEC2016:00:00:00.000 0 1
customer1 31JAN2017:00:00:00.000 0 1
customer2 31AUG2015:00:00:00.000 1 1
customer2 30SEP2015:00:00:00.000 0 1
customer2 31OCT2015:00:00:00.000 0 1
customer2 30NOV2015:00:00:00.000 0 1
Thanks in advance!
If possible, I try to stay away from SQL solutions that are, in essence, try to do time series-related tasks. There is a lot of overhead in effectively doing a Cartesian comparison of all rows to satisfy a range of dates about each custumer/date pair.
In this case (1) the data are ordered, and (2) there are no holes in the monthly sequence of record, so this simple data step is probably a lot faster:
data want;
set have;
by custumer;
if first.custumer then sum12=0;
sum12+flag-ifn(lag12(custumer)=custumer,lag12(flag),0);
run;
Editted note: If there are holes in the time series, I'd make a temporary data set view NEED, in which the holes are filled with dummy records have flag=. The apply the same logic as above:
data need (drop=nxt_: i) /view=need1;
merge have
have (firstobs=2 keep=date custumer rename=(date=nxt_date custumer=nxt_custumer));
if custumer^=nxt_custumer then output;
else do i=1 to intck('dtmonth',date,nxt_date);
output;
flag=.;
end;
run;
data want1;
set need1;
by custumer;
if first.custumer then sum12=0;
sum12+flag-ifn(lag12(custumer)=custumer,lag12(flag),0);
if flag^=.;
run;
data have;
input CUSTUMER :$10. DATE :datetime20. flag ;
format DATE datetime20. ;
cards;
customer1 31AUG2015:00:00:00.000 1 1
customer1 30SEP2015:00:00:00.000 0 1
customer1 31OCT2015:00:00:00.000 0 1
customer1 30NOV2015:00:00:00.000 0 1
customer1 31DEC2015:00:00:00.000 0 1
customer1 31JAN2016:00:00:00.000 0 1
customer1 29FEB2016:00:00:00.000 0 1
customer1 31MAR2016:00:00:00.000 0 1
customer1 30APR2016:00:00:00.000 0 1
customer1 31MAY2016:00:00:00.000 0 1
customer1 30JUN2016:00:00:00.000 0 1
customer1 31JUL2016:00:00:00.000 0 1
customer1 31AUG2016:00:00:00.000 0 0
customer1 30SEP2016:00:00:00.000 0 0
customer1 31OCT2016:00:00:00.000 1 1
customer1 30NOV2016:00:00:00.000 0 1
customer1 31DEC2016:00:00:00.000 0 1
customer1 31JAN2017:00:00:00.000 0 1
customer2 31AUG2015:00:00:00.000 1 1
customer2 30SEP2015:00:00:00.000 0 1
customer2 31OCT2015:00:00:00.000 0 1
customer2 30NOV2015:00:00:00.000 0 1
;
proc sql;
create table want as
select a.CUSTUMER, a.date,a.flag,sum((intnx('dtmonth',a.DATE,-12,'s')<b.date<=a.date)*b.flag) as desired
from have a, have b
where a.CUSTUMER=b.CUSTUMER
group by a.CUSTUMER,a.date
having min(b.date)=b.date
order by a.CUSTUMER, a.date;
quit;
If possible, I try to stay away from SQL solutions that are, in essence, try to do time series-related tasks. There is a lot of overhead in effectively doing a Cartesian comparison of all rows to satisfy a range of dates about each custumer/date pair.
In this case (1) the data are ordered, and (2) there are no holes in the monthly sequence of record, so this simple data step is probably a lot faster:
data want;
set have;
by custumer;
if first.custumer then sum12=0;
sum12+flag-ifn(lag12(custumer)=custumer,lag12(flag),0);
run;
Editted note: If there are holes in the time series, I'd make a temporary data set view NEED, in which the holes are filled with dummy records have flag=. The apply the same logic as above:
data need (drop=nxt_: i) /view=need1;
merge have
have (firstobs=2 keep=date custumer rename=(date=nxt_date custumer=nxt_custumer));
if custumer^=nxt_custumer then output;
else do i=1 to intck('dtmonth',date,nxt_date);
output;
flag=.;
end;
run;
data want1;
set need1;
by custumer;
if first.custumer then sum12=0;
sum12+flag-ifn(lag12(custumer)=custumer,lag12(flag),0);
if flag^=.;
run;
This one works, i used the first part of the code, but before that, I used COALESCE (flag,0) to fill in the missing values that i have, so it works for me now, without the second part.
Thank you!
I tend to wonder why any process that is only concerned with months is carrying around seconds down to thousandths of a second...
@ballardw Probably some Oracle data or similar kept as is.
Yes, it is.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.