Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Exclude Date Periods From Date Range Part II

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 87
Accepted Solution

Exclude Date Periods From Date Range Part II

Please see this question for context:

The answer given works perfectly but the file size is an issue as with large files it blows out considerably. I'm wondering if the approach below might be used which is considerably lighter on resources. I just cant work out how to get the date periods from it. I have a set of dates per clients and I need to exclude periods they have have paid for - and need to output the exact dates. Sample data and code to date are below. Many thanks in advance!

data phone ;

  infile datalines delimiter=',';

  input clientid $ sharerid $ phone $ startdt enddt ;

  informat startdt enddt date9. ;

  format startdt enddt date9. ;

  datalines;

client1,sharer1,555-6532,21Nov2011,10Dec2012

client1,sharer1,444-5655,29Nov2010,14Feb2011

client1,sharer1,333-1234,20May1993,17Aug1993

client1,sharer1,333-1234,08Sep1993,08Sep1993

client2,sharer2,666-6548,10Jul2001,12Nov2001

client2,sharer2,666-6548,10Apr2002,06Aug2002

client2,sharer2,111-5658,02Nov1992,12Aug1993

client2,sharer2,222-6589,10Jan2000,31Jan2000

client2,sharer2,777-8755,31Jan2000,03Feb2000

client2,sharer2,777-8755,25Jun2009,14Sep2009

client2,sharer2,321-6544,18Dec2003,08Apr2004

client2,sharer2,778-6589,07Jun2001,10Jul2001

client2,sharer2,999-9988,31Dec1993,26Mar1994

client2,sharer2,999-9988,28Mar1994,28Mar1994

client2,sharer2,888-7845,12Aug1993,23Aug1993

client2,sharer2,789-9876,10Aug1994,05Sep1994

client2,sharer2,789-9876,22Jun1995,10Jul1995

client2,sharer2,951-6235,08Apr2004,10Aug2004

client2,sharer2,753-1245,25Jan2007,18Jul2007

client2,sharer2,656-8989,12Nov1998,26Feb1999

client2,sharer2,656-8989,10Dec1999,10Jan2000

client2,sharer2,141-1414,23Aug2000,26Mar2001

client2,sharer2,141-1414,07Jun2001,10Jul2001

client2,sharer2,363-3636,19Jun1998,12Nov1998

client2,sharer2,852-8525,18Jun2009,02Jun2010

client2,sharer2,852-8525,20Oct2010,16May2011

client2,sharer2,852-8525,31May2012,10Dec2012

client2,sharer2,565-5656,05Sep1994,01Nov1994

client2,sharer2,565-5656,14Nov1994,14Nov1994

client2,sharer2,221-2212,01jan2012,30jan2012

;

data paid_periods;

  infile datalines delimiter=',';

  input clientid $ sharerid $ paystart payend ;

  informat paystart payend date9. ;

  format paystart payend date9. ;

  datalines;

client2,sharer2,31Aug1992,23Aug1993

client2,sharer2,25Dec1993,26Mar1994

client2,sharer2,10Aug1994,01Nov1994

client2,sharer2,15Mar1995,20Mar1995

client2,sharer2,19Jun1998,18Feb1999

client2,sharer2,10Dec1999,31Jan2000

client2,sharer2,20Jun2001,12Nov2001

client2,sharer2,10Apr2002,27Jul2002

client2,sharer2,18Dec2003,06Aug2004

client2,sharer2,11Dec2006,17Jul2007

client2,sharer2,18Jun2009,01Jun2010

client2,sharer2,20Oct2010,13May2011

client2,sharer2,14jan2012,21jan2012

;

proc sql;

      create table ph as

    select distinct clientid,sharerid,phone
      from phone

      ;

      create table pays as

    select distinct rs.*,ph.phone
    from paid_periods rs, ph
    where rs.clientid=ph.clientid and rs.sharerid=ph.sharerid

      ;

    quit;

    data phprd(keep=phone clientid sharerid phone_flag date)   ;

      set phone ;

      phone_flag=1;

      date=startdt;

      output;

      phone_flag=0;

      date=enddt;

      output;

    run;

    proc sort data=phprd;

      by phone clientid sharerid date;

    run;

    data payprd(keep = phone clientid sharerid pay_flag date)   ;

      set pays;

      pay_flag=1;

      date=paystart;

      output;

      pay_flag=0;

      date=payend;

      output;

    run;

    proc sort data=payprd;

      by phone clientid sharerid date;

    run;

    data timeline ;

      set payprd phprd;

      format date ddmmyy10. clientid sharerid 12.;

      by phone clientid sharerid date;

    run;

cheers

Steve


Accepted Solutions
Solution
‎12-12-2012 08:08 PM
PROC Star
Posts: 7,363

Re: Exclude Date Periods From Date Range Part II

Does my original jackhammer approach blow out your system's memory as well?  It shouldn't as, if it will work for one record, it should work for any number of records as well.

View solution in original post


All Replies
PROC Star
Posts: 7,363

Re: Exclude Date Periods From Date Range Part II

How many records do you have in each of the two files?

Frequent Contributor
Posts: 87

Re: Exclude Date Periods From Date Range Part II

It varies but can be over 4000 plus the periods can go right back to the 1990's which causing huge file bloat.

Solution
‎12-12-2012 08:08 PM
PROC Star
Posts: 7,363

Re: Exclude Date Periods From Date Range Part II

Does my original jackhammer approach blow out your system's memory as well?  It shouldn't as, if it will work for one record, it should work for any number of records as well.

Frequent Contributor
Posts: 87

Re: Exclude Date Periods From Date Range Part II

Hiya, never thought of that and yes you're quite correct. Used the Jack Hammer approach on my live data and it worked a charm. Thank you very much again.

cheers

Steve

PROC Star
Posts: 7,363

Re: Exclude Date Periods From Date Range Part II

That must be why they invented JackHammers!  Not applicable for all tasks but, sometimes, definitely the right tool for the job.

Super User
Posts: 5,081

Re: Exclude Date Periods From Date Range Part II

Steve,

A couple of things to consider as well ...

Memory is cheap, at least where computers are concerned.  Buy more?

The memory requirements can be reduced by breaking up the data into batches.  Subsets would be based on ranges of clientid, for example, but make sure the same client IDs appear in both data sets.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 406 views
  • 0 likes
  • 3 in conversation