BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SteveNZ
Obsidian | Level 7

Please see this question for context:

The answer given works perfectly but the file size is an issue as with large files it blows out considerably. I'm wondering if the approach below might be used which is considerably lighter on resources. I just cant work out how to get the date periods from it. I have a set of dates per clients and I need to exclude periods they have have paid for - and need to output the exact dates. Sample data and code to date are below. Many thanks in advance!

data phone ;

  infile datalines delimiter=',';

  input clientid $ sharerid $ phone $ startdt enddt ;

  informat startdt enddt date9. ;

  format startdt enddt date9. ;

  datalines;

client1,sharer1,555-6532,21Nov2011,10Dec2012

client1,sharer1,444-5655,29Nov2010,14Feb2011

client1,sharer1,333-1234,20May1993,17Aug1993

client1,sharer1,333-1234,08Sep1993,08Sep1993

client2,sharer2,666-6548,10Jul2001,12Nov2001

client2,sharer2,666-6548,10Apr2002,06Aug2002

client2,sharer2,111-5658,02Nov1992,12Aug1993

client2,sharer2,222-6589,10Jan2000,31Jan2000

client2,sharer2,777-8755,31Jan2000,03Feb2000

client2,sharer2,777-8755,25Jun2009,14Sep2009

client2,sharer2,321-6544,18Dec2003,08Apr2004

client2,sharer2,778-6589,07Jun2001,10Jul2001

client2,sharer2,999-9988,31Dec1993,26Mar1994

client2,sharer2,999-9988,28Mar1994,28Mar1994

client2,sharer2,888-7845,12Aug1993,23Aug1993

client2,sharer2,789-9876,10Aug1994,05Sep1994

client2,sharer2,789-9876,22Jun1995,10Jul1995

client2,sharer2,951-6235,08Apr2004,10Aug2004

client2,sharer2,753-1245,25Jan2007,18Jul2007

client2,sharer2,656-8989,12Nov1998,26Feb1999

client2,sharer2,656-8989,10Dec1999,10Jan2000

client2,sharer2,141-1414,23Aug2000,26Mar2001

client2,sharer2,141-1414,07Jun2001,10Jul2001

client2,sharer2,363-3636,19Jun1998,12Nov1998

client2,sharer2,852-8525,18Jun2009,02Jun2010

client2,sharer2,852-8525,20Oct2010,16May2011

client2,sharer2,852-8525,31May2012,10Dec2012

client2,sharer2,565-5656,05Sep1994,01Nov1994

client2,sharer2,565-5656,14Nov1994,14Nov1994

client2,sharer2,221-2212,01jan2012,30jan2012

;

data paid_periods;

  infile datalines delimiter=',';

  input clientid $ sharerid $ paystart payend ;

  informat paystart payend date9. ;

  format paystart payend date9. ;

  datalines;

client2,sharer2,31Aug1992,23Aug1993

client2,sharer2,25Dec1993,26Mar1994

client2,sharer2,10Aug1994,01Nov1994

client2,sharer2,15Mar1995,20Mar1995

client2,sharer2,19Jun1998,18Feb1999

client2,sharer2,10Dec1999,31Jan2000

client2,sharer2,20Jun2001,12Nov2001

client2,sharer2,10Apr2002,27Jul2002

client2,sharer2,18Dec2003,06Aug2004

client2,sharer2,11Dec2006,17Jul2007

client2,sharer2,18Jun2009,01Jun2010

client2,sharer2,20Oct2010,13May2011

client2,sharer2,14jan2012,21jan2012

;

proc sql;

      create table ph as

    select distinct clientid,sharerid,phone
      from phone

      ;

      create table pays as

    select distinct rs.*,ph.phone
    from paid_periods rs, ph
    where rs.clientid=ph.clientid and rs.sharerid=ph.sharerid

      ;

    quit;

    data phprd(keep=phone clientid sharerid phone_flag date)   ;

      set phone ;

      phone_flag=1;

      date=startdt;

      output;

      phone_flag=0;

      date=enddt;

      output;

    run;

    proc sort data=phprd;

      by phone clientid sharerid date;

    run;

    data payprd(keep = phone clientid sharerid pay_flag date)   ;

      set pays;

      pay_flag=1;

      date=paystart;

      output;

      pay_flag=0;

      date=payend;

      output;

    run;

    proc sort data=payprd;

      by phone clientid sharerid date;

    run;

    data timeline ;

      set payprd phprd;

      format date ddmmyy10. clientid sharerid 12.;

      by phone clientid sharerid date;

    run;

cheers

Steve

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

Does my original jackhammer approach blow out your system's memory as well?  It shouldn't as, if it will work for one record, it should work for any number of records as well.

View solution in original post

6 REPLIES 6
art297
Opal | Level 21

How many records do you have in each of the two files?

SteveNZ
Obsidian | Level 7

It varies but can be over 4000 plus the periods can go right back to the 1990's which causing huge file bloat.

art297
Opal | Level 21

Does my original jackhammer approach blow out your system's memory as well?  It shouldn't as, if it will work for one record, it should work for any number of records as well.

SteveNZ
Obsidian | Level 7

Hiya, never thought of that and yes you're quite correct. Used the Jack Hammer approach on my live data and it worked a charm. Thank you very much again.

cheers

Steve

art297
Opal | Level 21

That must be why they invented JackHammers!  Not applicable for all tasks but, sometimes, definitely the right tool for the job.

Astounding
PROC Star

Steve,

A couple of things to consider as well ...

Memory is cheap, at least where computers are concerned.  Buy more?

The memory requirements can be reduced by breaking up the data into batches.  Subsets would be based on ranges of clientid, for example, but make sure the same client IDs appear in both data sets.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1543 views
  • 0 likes
  • 3 in conversation