Take three months for example.
data have ;
input id spend date :date9. ;
format date date9.;
CARDS;
1 20 01JAN2010
1 10 01FEB2010
1 0 01Mar2010
1 10 01Apr2010
1 40 01May2010
2 20 01Mar2009
2 10 01Apr2009
2 10 01May2009
2 10 01jun2009
2 5 01jul2009
2 15 01Aug2009
;;;;
run;
proc sql;
create table want as
select *,case when (select sum(spend) from have where id=a.id and date between intnx('month',a.date,-2,'s') and a.date) eq 50 then 'Y' else 'N' end as flag
from have as a;
quit;
Xia Keshan
It does work, thanks for that but the self join is killing the server when i run it for 12 and 24 months on the complete table containing 700000 rows.
Do you think we can somehow use the Lag logic I was working with?
Thanks
Gaurav
Look at this thread:
https://communities.sas.com/thread/61456?start=0&tstart=0
There are data step based solutions in it.
Hi Gaurav,
I had some time to work on this problem. Still calculating the DPD_XX_730 variable is an excercise for you
(you will need to create one more view)
I hope AccountingPeriod is a SAS date variable. If not, you need to change the intnx() function.
proc sort data=have;
by customerid accountingperiod;
run;
data have_exp6 / view=have_exp6;
set have;
AccountingPeriod=intnx("month",AccountingPeriod,6,'s');
run;
data have_exp12 / view=have_exp12;
set have;
AccountingPeriod=intnx("month",AccountingPeriod,12,'s');
run;
data want;
set have_exp6(in=in6) have_exp12(in=in12) have(in=inT);/*interleaving original and expiry datasets Gaurav: maybe here you need to adust the order of the datasets! Maybe have comes first. Experiment!*/
by Firm CustomerID AccountingPeriod;
if first.Firm then do;/*initializing counters and sums at beginning of a group*/
DPD_30_180=0; DPD_60_180=0; DPD_90_180=0;
DPD_30_360=0; DPD_60_360=0; DPD_90_360=0;
end;
if inT then do;/*if transaction, then first calculate, then output*/
DPD_30_180+DPD30_flag; DPD_60_180+DPD60_flag; DPD_90_180+DPD90_flag;
DPD_30_360+DPD30_flag; DPD_60_360+DPD60_flag; DPD_90_360+DPD90_flag;
output;
end;
if in6 do;/*if 6 month expiry, then calculate*/
DPD_30_180+(-DPD30_flag); DPD_60_180+(-DPD60_flag); DPD_90_180+(-DPD90_flag);
end;
if in12 do;/*if 12 month expiry, then calculate*/
DPD_30_360+(-DPD30_flag); DPD_60_360+(-DPD60_flag); DPD_90_360+(-DPD90_flag);
end;
run;
Message was edited by: Gergely Bathó
Hey GB, Well In the input table I already have calculated 2 variables accountingperiod_6 (containing the month going back 6 months in time from current value of accountingperiod) and accountingperiod_12 (same logic but 12 months in time).
So I have everything in 1 input table now. I dont have to create seperate tables as you have shown.
I am working now with your code to see if it solves my problem.
Thanks
Gaurav
You are right! You don't need the additional views then.
Just change the set statement:
set have_exp6(in=in6 rename=(Accountingperiod_6=AccountingPeriod) drop=AccountingPeriod) have_exp12(rename=(Accountingperiod_12=AccountingPeriod) drop=AccountingPeriod in=in12) have(in=inT);
Good luck!
Thanks GB I got your code to work to suit my requirements.
I had a question although, why are you using the following expression
DPD-30-180+(-DPD30_flag), why are you using the negative '-' sign?
Gaurav
DPD_30-180+(-DPD30_flag); is equvivalent to
DPD_30_180=DPD_30_180-DPD30_flag; ( plus: DPD_30_180 becomes retained )
I just wanted to make that line very similar to the previous sum statment (and I also copy-pasted it from there):
DPD_30_180+180_DPD30_flag;
But this one would be syntactically wrong:
DPD_30-180-DPD30_flag; - it is not a sum statement.
My code interleaves the original dataset and 2 shifted datasets.
When I read a record from the original dataset, I need to add to the cummulative variables.
When I read from the shifted dataset, I need to substract. It is because after 6 (or 12) months the effect of the DPD_XX variable "expires".
700000 rows is not a big deal . I guess it will take you about six hours to get it . and faster version code ? I think you can use Array or Hash Table.
Xia Keshan
Ksharp, unfortunately time is th critcal factor here since I have to create 6,12 and 24 month flags and that too for 3 different variables (which comes to 9 self joins) and its been running now for the last 4 hours and i dont know when it is going to end.
Gergely, the example you gave isnt accumulating the data for the last 6 or 12 months as i wanted. It is just copying the data from one variable into the sum variable.
I have been playing around with an array and have got it to work as well, but again something is missing since I am not able to reinitialise the sum value once the BY group value changes. Here is the code
%let period = 6;
Data want_6;
set have;
by customerid;
array summed[&period] _temporary;
If X = &period then X=1;
else X+1;
summed
if accountingperiod > accountingperiod_6 then do;
if _n_ >= &period then do;
DPD30_flag_sum = sum(of summed
end;
end;
run;
Accountingperiod is the column holding the yyyymm value for the current observation and accountingperiod_6 is the varaible holding the yyyymm value for the 6th previous month.
Can you guys see what i am missing?
Gaurav
There are a couple of questions, Is there a missing month like 01FEB2012 01JUN2012 , Is there some other day in the same month like 02FEB2012 04FEB2012 ?
Still take three months for example :
data have ;
input id spend date :date9. ;
format date date9.;
CARDS;
1 20 01JAN2010
1 10 01FEB2010
1 0 01Mar2010
1 10 01Apr2010
1 40 01May2010
2 20 01Mar2009
2 10 01Apr2009
2 10 01May2009
2 10 01jun2009
2 5 01jul2009
2 15 01Aug2009
;;;;
run;
data want(drop=_date sum);
if _n_ eq 1 then do;
if 0 then set have(rename=(date=_date));
declare hash ha(dataset:'have(rename=(date=_date))',hashexp:20);
ha.definekey('id','_date');
ha.definedata('spend');
ha.definedone();
end;
set have;
sum=0;
do _date=intnx('month',date,-2,'s') to date;
if ha.find()=0 then sum+spend;
end;
flag=ifc( sum eq 50 , 'Y' , 'N');
run;
Xia Keshan
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.