BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sasuser381
Obsidian | Level 7

Hello everyone, I would really appreciate your help.

The first three columns is the sample of data I have and the last column is what I want to get with the SAS code.

So, I want for each customer and each month to calculate sum of flags in the last 12 months (or less, if I don't have 12, like in the first observation ), including the month of calculation.
So, for each month in the group I calculate the sum of flags for that month and 11 months before.


CUSTUMER   		 DATE				flag      Desired column

customer1   		31AUG2015:00:00:00.000		1		1
customer1	    	30SEP2015:00:00:00.000		0		1
customer1		31OCT2015:00:00:00.000		0		1
customer1		30NOV2015:00:00:00.000		0		1
customer1		31DEC2015:00:00:00.000		0		1
customer1		31JAN2016:00:00:00.000		0		1
customer1		29FEB2016:00:00:00.000		0		1
customer1		31MAR2016:00:00:00.000		0		1
customer1		30APR2016:00:00:00.000		0		1
customer1		31MAY2016:00:00:00.000		0		1
customer1		30JUN2016:00:00:00.000		0		1
customer1		31JUL2016:00:00:00.000		0		1
customer1		31AUG2016:00:00:00.000		0		0
customer1		30SEP2016:00:00:00.000		0		0
customer1		31OCT2016:00:00:00.000		1		1
customer1		30NOV2016:00:00:00.000		0		1
customer1		31DEC2016:00:00:00.000		0		1
customer1		31JAN2017:00:00:00.000		0		1		
customer2		31AUG2015:00:00:00.000		1		1
customer2 		30SEP2015:00:00:00.000		0		1
customer2		31OCT2015:00:00:00.000		0		1		
customer2 		30NOV2015:00:00:00.000		0		1	

 Thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

If possible, I try to stay away from SQL solutions that are, in essence, try to do time series-related tasks.  There is a lot of overhead in effectively doing a Cartesian comparison of all rows to satisfy a range of dates about each custumer/date pair.

 

In this case (1) the data are ordered, and (2) there are no holes in the monthly sequence of record, so this simple data step is probably a lot faster:

 

data want;
  set have;
  by custumer;
  if first.custumer then sum12=0;
  sum12+flag-ifn(lag12(custumer)=custumer,lag12(flag),0);
run;

 

Editted note:  If there are holes in the time series, I'd make a temporary data set view NEED, in which the holes are filled with dummy records have flag=.   The apply the same logic as above:

 

data need (drop=nxt_: i) /view=need1;
  merge have 
        have (firstobs=2 keep=date custumer rename=(date=nxt_date custumer=nxt_custumer));
  if custumer^=nxt_custumer then output;
  else do i=1 to intck('dtmonth',date,nxt_date);
    output;
    flag=.;
  end;
run;

data want1;
  set need1;
  by custumer;
  if first.custumer then sum12=0;
  sum12+flag-ifn(lag12(custumer)=custumer,lag12(flag),0);
  if flag^=.;
run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

6 REPLIES 6
novinosrin
Tourmaline | Level 20
data have;
input CUSTUMER  :$10. 		 DATE	:datetime20.			flag    ;
format  DATE	datetime20.	;
cards;
customer1   		31AUG2015:00:00:00.000		1		1
customer1	    	30SEP2015:00:00:00.000		0		1
customer1		31OCT2015:00:00:00.000		0		1
customer1		30NOV2015:00:00:00.000		0		1
customer1		31DEC2015:00:00:00.000		0		1
customer1		31JAN2016:00:00:00.000		0		1
customer1		29FEB2016:00:00:00.000		0		1
customer1		31MAR2016:00:00:00.000		0		1
customer1		30APR2016:00:00:00.000		0		1
customer1		31MAY2016:00:00:00.000		0		1
customer1		30JUN2016:00:00:00.000		0		1
customer1		31JUL2016:00:00:00.000		0		1
customer1		31AUG2016:00:00:00.000		0		0
customer1		30SEP2016:00:00:00.000		0		0
customer1		31OCT2016:00:00:00.000		1		1
customer1		30NOV2016:00:00:00.000		0		1
customer1		31DEC2016:00:00:00.000		0		1
customer1		31JAN2017:00:00:00.000		0		1		
customer2		31AUG2015:00:00:00.000		1		1
customer2 		30SEP2015:00:00:00.000		0		1
customer2		31OCT2015:00:00:00.000		0		1		
customer2 		30NOV2015:00:00:00.000		0		1	
;



proc sql;
create table want as
select a.CUSTUMER, a.date,a.flag,sum((intnx('dtmonth',a.DATE,-12,'s')<b.date<=a.date)*b.flag) as desired
from have a, have b
where a.CUSTUMER=b.CUSTUMER
group by a.CUSTUMER,a.date
having min(b.date)=b.date
order by a.CUSTUMER, a.date;
quit;
mkeintz
PROC Star

If possible, I try to stay away from SQL solutions that are, in essence, try to do time series-related tasks.  There is a lot of overhead in effectively doing a Cartesian comparison of all rows to satisfy a range of dates about each custumer/date pair.

 

In this case (1) the data are ordered, and (2) there are no holes in the monthly sequence of record, so this simple data step is probably a lot faster:

 

data want;
  set have;
  by custumer;
  if first.custumer then sum12=0;
  sum12+flag-ifn(lag12(custumer)=custumer,lag12(flag),0);
run;

 

Editted note:  If there are holes in the time series, I'd make a temporary data set view NEED, in which the holes are filled with dummy records have flag=.   The apply the same logic as above:

 

data need (drop=nxt_: i) /view=need1;
  merge have 
        have (firstobs=2 keep=date custumer rename=(date=nxt_date custumer=nxt_custumer));
  if custumer^=nxt_custumer then output;
  else do i=1 to intck('dtmonth',date,nxt_date);
    output;
    flag=.;
  end;
run;

data want1;
  set need1;
  by custumer;
  if first.custumer then sum12=0;
  sum12+flag-ifn(lag12(custumer)=custumer,lag12(flag),0);
  if flag^=.;
run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
sasuser381
Obsidian | Level 7

This one works,  i used the first part of the code, but before that, I used COALESCE (flag,0) to fill in the missing values that i have, so it works for me now, without the second part.

Thank you!

 

 

 

 

ballardw
Super User

I tend to wonder why any process that is only concerned with months is carrying around seconds down to thousandths of a second...

ChrisNZ
Tourmaline | Level 20

@ballardw Probably some Oracle data or similar kept as is.

sasuser381
Obsidian | Level 7

Yes, it is.

Catch up on SAS Innovate 2026

Nearly 200 sessions are now available on demand with the SAS Innovate Digital Pass.

Explore Now →
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 5371 views
  • 4 likes
  • 5 in conversation