Hello everyone, I would really appreciate your help.
The first three columns is the sample of data I have and the last column is what I want to get with the SAS code.
So, I want for each customer and each month to calculate sum of flags in the last 12 months (or less, if I don't have 12, like in the first observation ), including the month of calculation.
So, for each month in the group I calculate the sum of flags for that month and 11 months before.
CUSTUMER   		 DATE				flag      Desired column
customer1   		31AUG2015:00:00:00.000		1		1
customer1	    	30SEP2015:00:00:00.000		0		1
customer1		31OCT2015:00:00:00.000		0		1
customer1		30NOV2015:00:00:00.000		0		1
customer1		31DEC2015:00:00:00.000		0		1
customer1		31JAN2016:00:00:00.000		0		1
customer1		29FEB2016:00:00:00.000		0		1
customer1		31MAR2016:00:00:00.000		0		1
customer1		30APR2016:00:00:00.000		0		1
customer1		31MAY2016:00:00:00.000		0		1
customer1		30JUN2016:00:00:00.000		0		1
customer1		31JUL2016:00:00:00.000		0		1
customer1		31AUG2016:00:00:00.000		0		0
customer1		30SEP2016:00:00:00.000		0		0
customer1		31OCT2016:00:00:00.000		1		1
customer1		30NOV2016:00:00:00.000		0		1
customer1		31DEC2016:00:00:00.000		0		1
customer1		31JAN2017:00:00:00.000		0		1		
customer2		31AUG2015:00:00:00.000		1		1
customer2 		30SEP2015:00:00:00.000		0		1
customer2		31OCT2015:00:00:00.000		0		1		
customer2 		30NOV2015:00:00:00.000		0		1	Thanks in advance!
If possible, I try to stay away from SQL solutions that are, in essence, try to do time series-related tasks. There is a lot of overhead in effectively doing a Cartesian comparison of all rows to satisfy a range of dates about each custumer/date pair.
In this case (1) the data are ordered, and (2) there are no holes in the monthly sequence of record, so this simple data step is probably a lot faster:
data want;
  set have;
  by custumer;
  if first.custumer then sum12=0;
  sum12+flag-ifn(lag12(custumer)=custumer,lag12(flag),0);
run;
Editted note: If there are holes in the time series, I'd make a temporary data set view NEED, in which the holes are filled with dummy records have flag=. The apply the same logic as above:
data need (drop=nxt_: i) /view=need1;
  merge have 
        have (firstobs=2 keep=date custumer rename=(date=nxt_date custumer=nxt_custumer));
  if custumer^=nxt_custumer then output;
  else do i=1 to intck('dtmonth',date,nxt_date);
    output;
    flag=.;
  end;
run;
data want1;
  set need1;
  by custumer;
  if first.custumer then sum12=0;
  sum12+flag-ifn(lag12(custumer)=custumer,lag12(flag),0);
  if flag^=.;
run;
data have;
input CUSTUMER  :$10. 		 DATE	:datetime20.			flag    ;
format  DATE	datetime20.	;
cards;
customer1   		31AUG2015:00:00:00.000		1		1
customer1	    	30SEP2015:00:00:00.000		0		1
customer1		31OCT2015:00:00:00.000		0		1
customer1		30NOV2015:00:00:00.000		0		1
customer1		31DEC2015:00:00:00.000		0		1
customer1		31JAN2016:00:00:00.000		0		1
customer1		29FEB2016:00:00:00.000		0		1
customer1		31MAR2016:00:00:00.000		0		1
customer1		30APR2016:00:00:00.000		0		1
customer1		31MAY2016:00:00:00.000		0		1
customer1		30JUN2016:00:00:00.000		0		1
customer1		31JUL2016:00:00:00.000		0		1
customer1		31AUG2016:00:00:00.000		0		0
customer1		30SEP2016:00:00:00.000		0		0
customer1		31OCT2016:00:00:00.000		1		1
customer1		30NOV2016:00:00:00.000		0		1
customer1		31DEC2016:00:00:00.000		0		1
customer1		31JAN2017:00:00:00.000		0		1		
customer2		31AUG2015:00:00:00.000		1		1
customer2 		30SEP2015:00:00:00.000		0		1
customer2		31OCT2015:00:00:00.000		0		1		
customer2 		30NOV2015:00:00:00.000		0		1	
;
proc sql;
create table want as
select a.CUSTUMER, a.date,a.flag,sum((intnx('dtmonth',a.DATE,-12,'s')<b.date<=a.date)*b.flag) as desired
from have a, have b
where a.CUSTUMER=b.CUSTUMER
group by a.CUSTUMER,a.date
having min(b.date)=b.date
order by a.CUSTUMER, a.date;
quit;
					
				
			
			
				
			
			
			
			
			
			
			
		If possible, I try to stay away from SQL solutions that are, in essence, try to do time series-related tasks. There is a lot of overhead in effectively doing a Cartesian comparison of all rows to satisfy a range of dates about each custumer/date pair.
In this case (1) the data are ordered, and (2) there are no holes in the monthly sequence of record, so this simple data step is probably a lot faster:
data want;
  set have;
  by custumer;
  if first.custumer then sum12=0;
  sum12+flag-ifn(lag12(custumer)=custumer,lag12(flag),0);
run;
Editted note: If there are holes in the time series, I'd make a temporary data set view NEED, in which the holes are filled with dummy records have flag=. The apply the same logic as above:
data need (drop=nxt_: i) /view=need1;
  merge have 
        have (firstobs=2 keep=date custumer rename=(date=nxt_date custumer=nxt_custumer));
  if custumer^=nxt_custumer then output;
  else do i=1 to intck('dtmonth',date,nxt_date);
    output;
    flag=.;
  end;
run;
data want1;
  set need1;
  by custumer;
  if first.custumer then sum12=0;
  sum12+flag-ifn(lag12(custumer)=custumer,lag12(flag),0);
  if flag^=.;
run;
This one works,  i used the first part of the code, but before that, I used COALESCE (flag,0) to fill in the missing values that i have, so it works for me now, without the second part.
Thank you!
I tend to wonder why any process that is only concerned with months is carrying around seconds down to thousandths of a second...
@ballardw Probably some Oracle data or similar kept as is.
Yes, it is.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.