Take three months for example.
data have ; input id spend date :date9. ; format date date9.; CARDS; 1 20 01JAN2010 1 10 01FEB2010 1 0 01Mar2010 1 10 01Apr2010 1 40 01May2010 2 20 01Mar2009 2 10 01Apr2009 2 10 01May2009 2 10 01jun2009 2 5 01jul2009 2 15 01Aug2009 ;;;; run; proc sql; create table want as select *,case when (select sum(spend) from have where id=a.id and date between intnx('month',a.date,-2,'s') and a.date) eq 50 then 'Y' else 'N' end as flag from have as a; quit;
Xia Keshan
It does work, thanks for that but the self join is killing the server when i run it for 12 and 24 months on the complete table containing 700000 rows.
Do you think we can somehow use the Lag logic I was working with?
Thanks
Gaurav
Look at this thread:
https://communities.sas.com/thread/61456?start=0&tstart=0
There are data step based solutions in it.
Hi Gaurav,
I had some time to work on this problem. Still calculating the DPD_XX_730 variable is an excercise for you (you will need to create one more view)
I hope AccountingPeriod is a SAS date variable. If not, you need to change the intnx() function.
proc sort data=have;
by customerid accountingperiod;
run;
data have_exp6 / view=have_exp6;
set have;
AccountingPeriod=intnx("month",AccountingPeriod,6,'s');
run;
data have_exp12 / view=have_exp12;
set have;
AccountingPeriod=intnx("month",AccountingPeriod,12,'s');
run;
data want;
set have_exp6(in=in6) have_exp12(in=in12) have(in=inT);/*interleaving original and expiry datasets Gaurav: maybe here you need to adust the order of the datasets! Maybe have comes first. Experiment!*/
by Firm CustomerID AccountingPeriod;
if first.Firm then do;/*initializing counters and sums at beginning of a group*/
DPD_30_180=0; DPD_60_180=0; DPD_90_180=0;
DPD_30_360=0; DPD_60_360=0; DPD_90_360=0;
end;
if inT then do;/*if transaction, then first calculate, then output*/
DPD_30_180+DPD30_flag; DPD_60_180+DPD60_flag; DPD_90_180+DPD90_flag;
DPD_30_360+DPD30_flag; DPD_60_360+DPD60_flag; DPD_90_360+DPD90_flag;
output;
end;
if in6 do;/*if 6 month expiry, then calculate*/
DPD_30_180+(-DPD30_flag); DPD_60_180+(-DPD60_flag); DPD_90_180+(-DPD90_flag);
end;
if in12 do;/*if 12 month expiry, then calculate*/
DPD_30_360+(-DPD30_flag); DPD_60_360+(-DPD60_flag); DPD_90_360+(-DPD90_flag);
end;
run;
Message was edited by: Gergely Bathó
Hey GB, Well In the input table I already have calculated 2 variables accountingperiod_6 (containing the month going back 6 months in time from current value of accountingperiod) and accountingperiod_12 (same logic but 12 months in time).
So I have everything in 1 input table now. I dont have to create seperate tables as you have shown.
I am working now with your code to see if it solves my problem.
Thanks
Gaurav
You are right! You don't need the additional views then.
Just change the set statement:
set have_exp6(in=in6 rename=(Accountingperiod_6=AccountingPeriod) drop=AccountingPeriod) have_exp12(rename=(Accountingperiod_12=AccountingPeriod) drop=AccountingPeriod in=in12) have(in=inT);
Good luck!
Thanks GB I got your code to work to suit my requirements.
I had a question although, why are you using the following expression
DPD-30-180+(-DPD30_flag), why are you using the negative '-' sign?
Gaurav
DPD_30-180+(-DPD30_flag); is equvivalent to
DPD_30_180=DPD_30_180-DPD30_flag; ( plus: DPD_30_180 becomes retained )
I just wanted to make that line very similar to the previous sum statment (and I also copy-pasted it from there):
DPD_30_180+180_DPD30_flag;
But this one would be syntactically wrong:
DPD_30-180-DPD30_flag; - it is not a sum statement.
My code interleaves the original dataset and 2 shifted datasets.
When I read a record from the original dataset, I need to add to the cummulative variables.
When I read from the shifted dataset, I need to substract. It is because after 6 (or 12) months the effect of the DPD_XX variable "expires".
700000 rows is not a big deal . I guess it will take you about six hours to get it . and faster version code ? I think you can use Array or Hash Table.
Xia Keshan
Ksharp, unfortunately time is th critcal factor here since I have to create 6,12 and 24 month flags and that too for 3 different variables (which comes to 9 self joins) and its been running now for the last 4 hours and i dont know when it is going to end.
Gergely, the example you gave isnt accumulating the data for the last 6 or 12 months as i wanted. It is just copying the data from one variable into the sum variable.
I have been playing around with an array and have got it to work as well, but again something is missing since I am not able to reinitialise the sum value once the BY group value changes. Here is the code
%let period = 6;
Data want_6;
set have;
by customerid;
array summed[&period] _temporary;
If X = &period then X=1;
else X+1;
summed
if accountingperiod > accountingperiod_6 then do;
if _n_ >= &period then do;
DPD30_flag_sum = sum(of summed
end;
end;
run;
Accountingperiod is the column holding the yyyymm value for the current observation and accountingperiod_6 is the varaible holding the yyyymm value for the 6th previous month.
Can you guys see what i am missing?
Gaurav
There are a couple of questions, Is there a missing month like 01FEB2012 01JUN2012 , Is there some other day in the same month like 02FEB2012 04FEB2012 ?
Still take three months for example :
data have ; input id spend date :date9. ; format date date9.; CARDS; 1 20 01JAN2010 1 10 01FEB2010 1 0 01Mar2010 1 10 01Apr2010 1 40 01May2010 2 20 01Mar2009 2 10 01Apr2009 2 10 01May2009 2 10 01jun2009 2 5 01jul2009 2 15 01Aug2009 ;;;; run; data want(drop=_date sum); if _n_ eq 1 then do; if 0 then set have(rename=(date=_date)); declare hash ha(dataset:'have(rename=(date=_date))',hashexp:20); ha.definekey('id','_date'); ha.definedata('spend'); ha.definedone(); end; set have; sum=0; do _date=intnx('month',date,-2,'s') to date; if ha.find()=0 then sum+spend; end; flag=ifc( sum eq 50 , 'Y' , 'N'); run;
Xia Keshan
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.