Hello everyone, I would really appreciate your help.
The first three columns is the sample of data I have and the last column is what I want to get with the SAS code.
So, I want for each customer and each month to calculate sum of flags in the last 12 months (or less, if I don't have 12, like in the first observation ), including the month of calculation.
So, for each month in the group I calculate the sum of flags for that month and 11 months before.
CUSTUMER DATE flag Desired column
customer1 31AUG2015:00:00:00.000 1 1
customer1 30SEP2015:00:00:00.000 0 1
customer1 31OCT2015:00:00:00.000 0 1
customer1 30NOV2015:00:00:00.000 0 1
customer1 31DEC2015:00:00:00.000 0 1
customer1 31JAN2016:00:00:00.000 0 1
customer1 29FEB2016:00:00:00.000 0 1
customer1 31MAR2016:00:00:00.000 0 1
customer1 30APR2016:00:00:00.000 0 1
customer1 31MAY2016:00:00:00.000 0 1
customer1 30JUN2016:00:00:00.000 0 1
customer1 31JUL2016:00:00:00.000 0 1
customer1 31AUG2016:00:00:00.000 0 0
customer1 30SEP2016:00:00:00.000 0 0
customer1 31OCT2016:00:00:00.000 1 1
customer1 30NOV2016:00:00:00.000 0 1
customer1 31DEC2016:00:00:00.000 0 1
customer1 31JAN2017:00:00:00.000 0 1
customer2 31AUG2015:00:00:00.000 1 1
customer2 30SEP2015:00:00:00.000 0 1
customer2 31OCT2015:00:00:00.000 0 1
customer2 30NOV2015:00:00:00.000 0 1
Thanks in advance!
If possible, I try to stay away from SQL solutions that are, in essence, try to do time series-related tasks. There is a lot of overhead in effectively doing a Cartesian comparison of all rows to satisfy a range of dates about each custumer/date pair.
In this case (1) the data are ordered, and (2) there are no holes in the monthly sequence of record, so this simple data step is probably a lot faster:
data want;
set have;
by custumer;
if first.custumer then sum12=0;
sum12+flag-ifn(lag12(custumer)=custumer,lag12(flag),0);
run;
Editted note: If there are holes in the time series, I'd make a temporary data set view NEED, in which the holes are filled with dummy records have flag=. The apply the same logic as above:
data need (drop=nxt_: i) /view=need1;
merge have
have (firstobs=2 keep=date custumer rename=(date=nxt_date custumer=nxt_custumer));
if custumer^=nxt_custumer then output;
else do i=1 to intck('dtmonth',date,nxt_date);
output;
flag=.;
end;
run;
data want1;
set need1;
by custumer;
if first.custumer then sum12=0;
sum12+flag-ifn(lag12(custumer)=custumer,lag12(flag),0);
if flag^=.;
run;
data have;
input CUSTUMER :$10. DATE :datetime20. flag ;
format DATE datetime20. ;
cards;
customer1 31AUG2015:00:00:00.000 1 1
customer1 30SEP2015:00:00:00.000 0 1
customer1 31OCT2015:00:00:00.000 0 1
customer1 30NOV2015:00:00:00.000 0 1
customer1 31DEC2015:00:00:00.000 0 1
customer1 31JAN2016:00:00:00.000 0 1
customer1 29FEB2016:00:00:00.000 0 1
customer1 31MAR2016:00:00:00.000 0 1
customer1 30APR2016:00:00:00.000 0 1
customer1 31MAY2016:00:00:00.000 0 1
customer1 30JUN2016:00:00:00.000 0 1
customer1 31JUL2016:00:00:00.000 0 1
customer1 31AUG2016:00:00:00.000 0 0
customer1 30SEP2016:00:00:00.000 0 0
customer1 31OCT2016:00:00:00.000 1 1
customer1 30NOV2016:00:00:00.000 0 1
customer1 31DEC2016:00:00:00.000 0 1
customer1 31JAN2017:00:00:00.000 0 1
customer2 31AUG2015:00:00:00.000 1 1
customer2 30SEP2015:00:00:00.000 0 1
customer2 31OCT2015:00:00:00.000 0 1
customer2 30NOV2015:00:00:00.000 0 1
;
proc sql;
create table want as
select a.CUSTUMER, a.date,a.flag,sum((intnx('dtmonth',a.DATE,-12,'s')<b.date<=a.date)*b.flag) as desired
from have a, have b
where a.CUSTUMER=b.CUSTUMER
group by a.CUSTUMER,a.date
having min(b.date)=b.date
order by a.CUSTUMER, a.date;
quit;
If possible, I try to stay away from SQL solutions that are, in essence, try to do time series-related tasks. There is a lot of overhead in effectively doing a Cartesian comparison of all rows to satisfy a range of dates about each custumer/date pair.
In this case (1) the data are ordered, and (2) there are no holes in the monthly sequence of record, so this simple data step is probably a lot faster:
data want;
set have;
by custumer;
if first.custumer then sum12=0;
sum12+flag-ifn(lag12(custumer)=custumer,lag12(flag),0);
run;
Editted note: If there are holes in the time series, I'd make a temporary data set view NEED, in which the holes are filled with dummy records have flag=. The apply the same logic as above:
data need (drop=nxt_: i) /view=need1;
merge have
have (firstobs=2 keep=date custumer rename=(date=nxt_date custumer=nxt_custumer));
if custumer^=nxt_custumer then output;
else do i=1 to intck('dtmonth',date,nxt_date);
output;
flag=.;
end;
run;
data want1;
set need1;
by custumer;
if first.custumer then sum12=0;
sum12+flag-ifn(lag12(custumer)=custumer,lag12(flag),0);
if flag^=.;
run;
This one works, i used the first part of the code, but before that, I used COALESCE (flag,0) to fill in the missing values that i have, so it works for me now, without the second part.
Thank you!
I tend to wonder why any process that is only concerned with months is carrying around seconds down to thousandths of a second...
@ballardw Probably some Oracle data or similar kept as is.
Yes, it is.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.