BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sasuser381
Obsidian | Level 7

Hello everyone, I would really appreciate your help.

The first three columns is the sample of data I have and the last column is what I want to get with the SAS code.

So, I want for each customer and each month to calculate sum of flags in the last 12 months (or less, if I don't have 12, like in the first observation ), including the month of calculation.
So, for each month in the group I calculate the sum of flags for that month and 11 months before.


CUSTUMER   		 DATE				flag      Desired column

customer1   		31AUG2015:00:00:00.000		1		1
customer1	    	30SEP2015:00:00:00.000		0		1
customer1		31OCT2015:00:00:00.000		0		1
customer1		30NOV2015:00:00:00.000		0		1
customer1		31DEC2015:00:00:00.000		0		1
customer1		31JAN2016:00:00:00.000		0		1
customer1		29FEB2016:00:00:00.000		0		1
customer1		31MAR2016:00:00:00.000		0		1
customer1		30APR2016:00:00:00.000		0		1
customer1		31MAY2016:00:00:00.000		0		1
customer1		30JUN2016:00:00:00.000		0		1
customer1		31JUL2016:00:00:00.000		0		1
customer1		31AUG2016:00:00:00.000		0		0
customer1		30SEP2016:00:00:00.000		0		0
customer1		31OCT2016:00:00:00.000		1		1
customer1		30NOV2016:00:00:00.000		0		1
customer1		31DEC2016:00:00:00.000		0		1
customer1		31JAN2017:00:00:00.000		0		1		
customer2		31AUG2015:00:00:00.000		1		1
customer2 		30SEP2015:00:00:00.000		0		1
customer2		31OCT2015:00:00:00.000		0		1		
customer2 		30NOV2015:00:00:00.000		0		1	

 Thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

If possible, I try to stay away from SQL solutions that are, in essence, try to do time series-related tasks.  There is a lot of overhead in effectively doing a Cartesian comparison of all rows to satisfy a range of dates about each custumer/date pair.

 

In this case (1) the data are ordered, and (2) there are no holes in the monthly sequence of record, so this simple data step is probably a lot faster:

 

data want;
  set have;
  by custumer;
  if first.custumer then sum12=0;
  sum12+flag-ifn(lag12(custumer)=custumer,lag12(flag),0);
run;

 

Editted note:  If there are holes in the time series, I'd make a temporary data set view NEED, in which the holes are filled with dummy records have flag=.   The apply the same logic as above:

 

data need (drop=nxt_: i) /view=need1;
  merge have 
        have (firstobs=2 keep=date custumer rename=(date=nxt_date custumer=nxt_custumer));
  if custumer^=nxt_custumer then output;
  else do i=1 to intck('dtmonth',date,nxt_date);
    output;
    flag=.;
  end;
run;

data want1;
  set need1;
  by custumer;
  if first.custumer then sum12=0;
  sum12+flag-ifn(lag12(custumer)=custumer,lag12(flag),0);
  if flag^=.;
run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

6 REPLIES 6
novinosrin
Tourmaline | Level 20
data have;
input CUSTUMER  :$10. 		 DATE	:datetime20.			flag    ;
format  DATE	datetime20.	;
cards;
customer1   		31AUG2015:00:00:00.000		1		1
customer1	    	30SEP2015:00:00:00.000		0		1
customer1		31OCT2015:00:00:00.000		0		1
customer1		30NOV2015:00:00:00.000		0		1
customer1		31DEC2015:00:00:00.000		0		1
customer1		31JAN2016:00:00:00.000		0		1
customer1		29FEB2016:00:00:00.000		0		1
customer1		31MAR2016:00:00:00.000		0		1
customer1		30APR2016:00:00:00.000		0		1
customer1		31MAY2016:00:00:00.000		0		1
customer1		30JUN2016:00:00:00.000		0		1
customer1		31JUL2016:00:00:00.000		0		1
customer1		31AUG2016:00:00:00.000		0		0
customer1		30SEP2016:00:00:00.000		0		0
customer1		31OCT2016:00:00:00.000		1		1
customer1		30NOV2016:00:00:00.000		0		1
customer1		31DEC2016:00:00:00.000		0		1
customer1		31JAN2017:00:00:00.000		0		1		
customer2		31AUG2015:00:00:00.000		1		1
customer2 		30SEP2015:00:00:00.000		0		1
customer2		31OCT2015:00:00:00.000		0		1		
customer2 		30NOV2015:00:00:00.000		0		1	
;



proc sql;
create table want as
select a.CUSTUMER, a.date,a.flag,sum((intnx('dtmonth',a.DATE,-12,'s')<b.date<=a.date)*b.flag) as desired
from have a, have b
where a.CUSTUMER=b.CUSTUMER
group by a.CUSTUMER,a.date
having min(b.date)=b.date
order by a.CUSTUMER, a.date;
quit;
mkeintz
PROC Star

If possible, I try to stay away from SQL solutions that are, in essence, try to do time series-related tasks.  There is a lot of overhead in effectively doing a Cartesian comparison of all rows to satisfy a range of dates about each custumer/date pair.

 

In this case (1) the data are ordered, and (2) there are no holes in the monthly sequence of record, so this simple data step is probably a lot faster:

 

data want;
  set have;
  by custumer;
  if first.custumer then sum12=0;
  sum12+flag-ifn(lag12(custumer)=custumer,lag12(flag),0);
run;

 

Editted note:  If there are holes in the time series, I'd make a temporary data set view NEED, in which the holes are filled with dummy records have flag=.   The apply the same logic as above:

 

data need (drop=nxt_: i) /view=need1;
  merge have 
        have (firstobs=2 keep=date custumer rename=(date=nxt_date custumer=nxt_custumer));
  if custumer^=nxt_custumer then output;
  else do i=1 to intck('dtmonth',date,nxt_date);
    output;
    flag=.;
  end;
run;

data want1;
  set need1;
  by custumer;
  if first.custumer then sum12=0;
  sum12+flag-ifn(lag12(custumer)=custumer,lag12(flag),0);
  if flag^=.;
run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
sasuser381
Obsidian | Level 7

This one works,  i used the first part of the code, but before that, I used COALESCE (flag,0) to fill in the missing values that i have, so it works for me now, without the second part.

Thank you!

 

 

 

 

ballardw
Super User

I tend to wonder why any process that is only concerned with months is carrying around seconds down to thousandths of a second...

ChrisNZ
Tourmaline | Level 20

@ballardw Probably some Oracle data or similar kept as is.

sasuser381
Obsidian | Level 7

Yes, it is.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 2967 views
  • 4 likes
  • 5 in conversation