Take three months for example.
data have ; input id spend date :date9. ; format date date9.; CARDS; 1 20 01JAN2010 1 10 01FEB2010 1 0 01Mar2010 1 10 01Apr2010 1 40 01May2010 2 20 01Mar2009 2 10 01Apr2009 2 10 01May2009 2 10 01jun2009 2 5 01jul2009 2 15 01Aug2009 ;;;; run; proc sql; create table want as select *,case when (select sum(spend) from have where id=a.id and date between intnx('month',a.date,-2,'s') and a.date) eq 50 then 'Y' else 'N' end as flag from have as a; quit;
Xia Keshan
It does work, thanks for that but the self join is killing the server when i run it for 12 and 24 months on the complete table containing 700000 rows.
Do you think we can somehow use the Lag logic I was working with?
Thanks
Gaurav
Look at this thread:
https://communities.sas.com/thread/61456?start=0&tstart=0
There are data step based solutions in it.
Hi Gaurav,
I had some time to work on this problem. Still calculating the DPD_XX_730 variable is an excercise for you (you will need to create one more view)
I hope AccountingPeriod is a SAS date variable. If not, you need to change the intnx() function.
proc sort data=have;
by customerid accountingperiod;
run;
data have_exp6 / view=have_exp6;
set have;
AccountingPeriod=intnx("month",AccountingPeriod,6,'s');
run;
data have_exp12 / view=have_exp12;
set have;
AccountingPeriod=intnx("month",AccountingPeriod,12,'s');
run;
data want;
set have_exp6(in=in6) have_exp12(in=in12) have(in=inT);/*interleaving original and expiry datasets Gaurav: maybe here you need to adust the order of the datasets! Maybe have comes first. Experiment!*/
by Firm CustomerID AccountingPeriod;
if first.Firm then do;/*initializing counters and sums at beginning of a group*/
DPD_30_180=0; DPD_60_180=0; DPD_90_180=0;
DPD_30_360=0; DPD_60_360=0; DPD_90_360=0;
end;
if inT then do;/*if transaction, then first calculate, then output*/
DPD_30_180+DPD30_flag; DPD_60_180+DPD60_flag; DPD_90_180+DPD90_flag;
DPD_30_360+DPD30_flag; DPD_60_360+DPD60_flag; DPD_90_360+DPD90_flag;
output;
end;
if in6 do;/*if 6 month expiry, then calculate*/
DPD_30_180+(-DPD30_flag); DPD_60_180+(-DPD60_flag); DPD_90_180+(-DPD90_flag);
end;
if in12 do;/*if 12 month expiry, then calculate*/
DPD_30_360+(-DPD30_flag); DPD_60_360+(-DPD60_flag); DPD_90_360+(-DPD90_flag);
end;
run;
Message was edited by: Gergely Bathó
Hey GB, Well In the input table I already have calculated 2 variables accountingperiod_6 (containing the month going back 6 months in time from current value of accountingperiod) and accountingperiod_12 (same logic but 12 months in time).
So I have everything in 1 input table now. I dont have to create seperate tables as you have shown.
I am working now with your code to see if it solves my problem.
Thanks
Gaurav
You are right! You don't need the additional views then.
Just change the set statement:
set have_exp6(in=in6 rename=(Accountingperiod_6=AccountingPeriod) drop=AccountingPeriod) have_exp12(rename=(Accountingperiod_12=AccountingPeriod) drop=AccountingPeriod in=in12) have(in=inT);
Good luck!
Thanks GB I got your code to work to suit my requirements.
I had a question although, why are you using the following expression
DPD-30-180+(-DPD30_flag), why are you using the negative '-' sign?
Gaurav
DPD_30-180+(-DPD30_flag); is equvivalent to
DPD_30_180=DPD_30_180-DPD30_flag; ( plus: DPD_30_180 becomes retained )
I just wanted to make that line very similar to the previous sum statment (and I also copy-pasted it from there):
DPD_30_180+180_DPD30_flag;
But this one would be syntactically wrong:
DPD_30-180-DPD30_flag; - it is not a sum statement.
My code interleaves the original dataset and 2 shifted datasets.
When I read a record from the original dataset, I need to add to the cummulative variables.
When I read from the shifted dataset, I need to substract. It is because after 6 (or 12) months the effect of the DPD_XX variable "expires".
700000 rows is not a big deal . I guess it will take you about six hours to get it . and faster version code ? I think you can use Array or Hash Table.
Xia Keshan
Ksharp, unfortunately time is th critcal factor here since I have to create 6,12 and 24 month flags and that too for 3 different variables (which comes to 9 self joins) and its been running now for the last 4 hours and i dont know when it is going to end.
Gergely, the example you gave isnt accumulating the data for the last 6 or 12 months as i wanted. It is just copying the data from one variable into the sum variable.
I have been playing around with an array and have got it to work as well, but again something is missing since I am not able to reinitialise the sum value once the BY group value changes. Here is the code
%let period = 6;
Data want_6;
set have;
by customerid;
array summed[&period] _temporary;
If X = &period then X=1;
else X+1;
summed
if accountingperiod > accountingperiod_6 then do;
if _n_ >= &period then do;
DPD30_flag_sum = sum(of summed
end;
end;
run;
Accountingperiod is the column holding the yyyymm value for the current observation and accountingperiod_6 is the varaible holding the yyyymm value for the 6th previous month.
Can you guys see what i am missing?
Gaurav
There are a couple of questions, Is there a missing month like 01FEB2012 01JUN2012 , Is there some other day in the same month like 02FEB2012 04FEB2012 ?
Still take three months for example :
data have ; input id spend date :date9. ; format date date9.; CARDS; 1 20 01JAN2010 1 10 01FEB2010 1 0 01Mar2010 1 10 01Apr2010 1 40 01May2010 2 20 01Mar2009 2 10 01Apr2009 2 10 01May2009 2 10 01jun2009 2 5 01jul2009 2 15 01Aug2009 ;;;; run; data want(drop=_date sum); if _n_ eq 1 then do; if 0 then set have(rename=(date=_date)); declare hash ha(dataset:'have(rename=(date=_date))',hashexp:20); ha.definekey('id','_date'); ha.definedata('spend'); ha.definedone(); end; set have; sum=0; do _date=intnx('month',date,-2,'s') to date; if ha.find()=0 then sum+spend; end; flag=ifc( sum eq 50 , 'Y' , 'N'); run;
Xia Keshan
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.