BookmarkSubscribeRSS Feed
Ksharp
Super User

Take three months for example.

data have ;
input id spend  date :date9.  ;
format date date9.;
CARDS;
1     20      01JAN2010
1     10      01FEB2010
1      0      01Mar2010
1     10      01Apr2010
1     40      01May2010       
2     20      01Mar2009
2     10      01Apr2009   
2     10      01May2009
2     10      01jun2009
2      5      01jul2009
2     15      01Aug2009  
;;;;
run;
proc sql;
create table want as
 select *,case when (select sum(spend) from have where id=a.id and date between intnx('month',a.date,-2,'s') and a.date) eq 50 then 'Y' else 'N' end as flag
  from have as a;
quit;

Xia Keshan

gauravsood19
Calcite | Level 5

It does work, thanks for that but the self join is killing the server when i run it for 12 and 24 months on the complete table containing 700000 rows.

Do you think we can somehow use the Lag logic I was working with?

Thanks

Gaurav

gergely_batho
SAS Employee

Look at this thread:

https://communities.sas.com/thread/61456?start=0&tstart=0

There are data step based solutions in it.

gergely_batho
SAS Employee

Hi Gaurav,

I had some time to work on this problem. Still calculating the DPD_XX_730 variable is an excercise for you Smiley Happy (you will need to create one more view)

I hope AccountingPeriod is a SAS date variable. If not, you need to change the intnx() function.

proc sort data=have;

by customerid accountingperiod;

run;

data have_exp6 / view=have_exp6;

  set have;

  AccountingPeriod=intnx("month",AccountingPeriod,6,'s');

run;

data have_exp12 / view=have_exp12;

  set have;

  AccountingPeriod=intnx("month",AccountingPeriod,12,'s');

run;

data want;

  set have_exp6(in=in6) have_exp12(in=in12) have(in=inT);/*interleaving original and expiry datasets Gaurav: maybe here you need to adust the order of the datasets! Maybe have comes first. Experiment!*/

  by Firm CustomerID AccountingPeriod;

  if first.Firm then do;/*initializing counters and sums at beginning of a group*/

       DPD_30_180=0; DPD_60_180=0; DPD_90_180=0;

       DPD_30_360=0; DPD_60_360=0; DPD_90_360=0;

  end;

  if inT then do;/*if transaction, then first calculate, then output*/

       DPD_30_180+DPD30_flag; DPD_60_180+DPD60_flag; DPD_90_180+DPD90_flag;

       DPD_30_360+DPD30_flag; DPD_60_360+DPD60_flag; DPD_90_360+DPD90_flag;

       output;

  end;

  if in6 do;/*if 6 month expiry, then calculate*/

       DPD_30_180+(-DPD30_flag); DPD_60_180+(-DPD60_flag); DPD_90_180+(-DPD90_flag);

  end;

  if in12 do;/*if 12 month expiry, then calculate*/

       DPD_30_360+(-DPD30_flag); DPD_60_360+(-DPD60_flag); DPD_90_360+(-DPD90_flag);

  end;

run;

Message was edited by: Gergely Bathó

gauravsood19
Calcite | Level 5

Hey GB, Well In the input table I already have calculated 2 variables accountingperiod_6 (containing the month going back 6 months in time from current value of accountingperiod) and accountingperiod_12 (same logic but 12 months in time).

So I have everything in 1 input table now. I dont have to create seperate tables as you have shown.

I am working now with your code to see if it solves my problem.

Thanks

Gaurav

gergely_batho
SAS Employee

You are right! You don't need the additional views then.

Just change the set statement:

set have_exp6(in=in6 rename=(Accountingperiod_6=AccountingPeriod) drop=AccountingPeriod) have_exp12(rename=(Accountingperiod_12=AccountingPeriod) drop=AccountingPeriod in=in12) have(in=inT);

Good luck!

gauravsood19
Calcite | Level 5

Thanks GB I got your code to work to suit my requirements.

I had a question although, why are you using the following expression

DPD-30-180+(-DPD30_flag), why are you using the negative '-' sign?

Gaurav

gergely_batho
SAS Employee

DPD_30-180+(-DPD30_flag);   is equvivalent to

DPD_30_180=DPD_30_180-DPD30_flag;   ( plus:  DPD_30_180 becomes retained )

I just wanted to make that line very similar to the previous sum statment (and I also copy-pasted it from there):

DPD_30_180+180_DPD30_flag;

But this one would be syntactically wrong:

DPD_30-180-DPD30_flag;    - it is not a sum statement.

My code interleaves the original dataset and 2 shifted datasets.

When I read a record from the original dataset, I need to add to the cummulative variables.

When I read from the shifted dataset, I need to substract. It is because after 6 (or 12) months the effect of the DPD_XX variable "expires".

Ksharp
Super User

700000 rows is not a big deal .  I guess it will take you about six hours to get it . and faster version code ? I think you can use Array or Hash Table.




Xia Keshan

gauravsood19
Calcite | Level 5

Ksharp, unfortunately time is th critcal factor here since I have to create 6,12 and 24 month flags and that too for 3 different variables (which comes to 9 self joins) and its been running now for the last 4 hours and i dont know when it is going to end.

Gergely, the example you gave isnt accumulating the data for the last 6 or 12 months as i wanted. It is just copying the data from one variable into the sum variable.

I have been playing around with an array and have got it to work as well, but again something is missing since I am not able to reinitialise the sum value once the BY group value changes. Here is the code

%let period = 6;

Data want_6;

set have;

by customerid;

array summed[&period] _temporary;

If X = &period then X=1;

else X+1;

summed = DPD30_flag;

if accountingperiod > accountingperiod_6 then do;

     if _n_ >= &period then do;

          DPD30_flag_sum = sum(of summed

  • );
  •      end;

    end;

    run;

    Accountingperiod is the column holding the yyyymm value for the current observation and accountingperiod_6 is the varaible holding the yyyymm value for the 6th previous month.

    Can you guys see what i am missing?

    Gaurav


    Ksharp
    Super User

    There are a couple of questions, Is there a missing month like 01FEB2012  01JUN2012  , Is there some other day in the same month like 02FEB2012 04FEB2012 ?

    Still take three months for example :

    
    
    
     
    data have ;
    input id spend  date :date9.  ;
    format date date9.;
    CARDS;
    1     20      01JAN2010
    1     10      01FEB2010
    1      0      01Mar2010
    1     10      01Apr2010
    1     40      01May2010       
    2     20      01Mar2009
    2     10      01Apr2009   
    2     10      01May2009
    2     10      01jun2009
    2      5      01jul2009
    2     15      01Aug2009  
    ;;;;
    run;
    data want(drop=_date sum);
     if _n_ eq 1 then do;
      if 0 then set have(rename=(date=_date));
      declare hash ha(dataset:'have(rename=(date=_date))',hashexp:20);
       ha.definekey('id','_date');
       ha.definedata('spend');
       ha.definedone();
    end;
    set have;
    sum=0;
    do _date=intnx('month',date,-2,'s') to date;
     if ha.find()=0 then sum+spend;
    end;
    flag=ifc( sum eq 50 , 'Y' , 'N');
    run;
    

    Xia Keshan

    How to Concatenate Values

    Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

    Find more tutorials on the SAS Users YouTube channel.

    SAS Training: Just a Click Away

     Ready to level-up your skills? Choose your own adventure.

    Browse our catalog!

    Discussion stats
    • 25 replies
    • 11152 views
    • 1 like
    • 11 in conversation