Solved: Re: Exclude Date Periods From Date Range Part II

SteveNZ · Posted 12-12-2012 06:44 PM

Please see this question for context:

The answer given works perfectly but the file size is an issue as with large files it blows out considerably. I'm wondering if the approach below might be used which is considerably lighter on resources. I just cant work out how to get the date periods from it. I have a set of dates per clients and I need to exclude periods they have have paid for - and need to output the exact dates. Sample data and code to date are below. Many thanks in advance!

data phone ;

infile datalines delimiter=',';

input clientid $ sharerid $ phone $ startdt enddt ;

informat startdt enddt date9. ;

format startdt enddt date9. ;

datalines;

client1,sharer1,555-6532,21Nov2011,10Dec2012

client1,sharer1,444-5655,29Nov2010,14Feb2011

client1,sharer1,333-1234,20May1993,17Aug1993

client1,sharer1,333-1234,08Sep1993,08Sep1993

client2,sharer2,666-6548,10Jul2001,12Nov2001

client2,sharer2,666-6548,10Apr2002,06Aug2002

client2,sharer2,111-5658,02Nov1992,12Aug1993

client2,sharer2,222-6589,10Jan2000,31Jan2000

client2,sharer2,777-8755,31Jan2000,03Feb2000

client2,sharer2,777-8755,25Jun2009,14Sep2009

client2,sharer2,321-6544,18Dec2003,08Apr2004

client2,sharer2,778-6589,07Jun2001,10Jul2001

client2,sharer2,999-9988,31Dec1993,26Mar1994

client2,sharer2,999-9988,28Mar1994,28Mar1994

client2,sharer2,888-7845,12Aug1993,23Aug1993

client2,sharer2,789-9876,10Aug1994,05Sep1994

client2,sharer2,789-9876,22Jun1995,10Jul1995

client2,sharer2,951-6235,08Apr2004,10Aug2004

client2,sharer2,753-1245,25Jan2007,18Jul2007

client2,sharer2,656-8989,12Nov1998,26Feb1999

client2,sharer2,656-8989,10Dec1999,10Jan2000

client2,sharer2,141-1414,23Aug2000,26Mar2001

client2,sharer2,141-1414,07Jun2001,10Jul2001

client2,sharer2,363-3636,19Jun1998,12Nov1998

client2,sharer2,852-8525,18Jun2009,02Jun2010

client2,sharer2,852-8525,20Oct2010,16May2011

client2,sharer2,852-8525,31May2012,10Dec2012

client2,sharer2,565-5656,05Sep1994,01Nov1994

client2,sharer2,565-5656,14Nov1994,14Nov1994

client2,sharer2,221-2212,01jan2012,30jan2012

;

data paid_periods;

infile datalines delimiter=',';

input clientid $ sharerid $ paystart payend ;

informat paystart payend date9. ;

format paystart payend date9. ;

datalines;

client2,sharer2,31Aug1992,23Aug1993

client2,sharer2,25Dec1993,26Mar1994

client2,sharer2,10Aug1994,01Nov1994

client2,sharer2,15Mar1995,20Mar1995

client2,sharer2,19Jun1998,18Feb1999

client2,sharer2,10Dec1999,31Jan2000

client2,sharer2,20Jun2001,12Nov2001

client2,sharer2,10Apr2002,27Jul2002

client2,sharer2,18Dec2003,06Aug2004

client2,sharer2,11Dec2006,17Jul2007

client2,sharer2,18Jun2009,01Jun2010

client2,sharer2,20Oct2010,13May2011

client2,sharer2,14jan2012,21jan2012

;

proc sql;

create table ph as

	select distinct clientid,sharerid,phone
	from phone

;

create table pays as

	select distinct rs.*,ph.phone
	from paid_periods rs, ph
	where rs.clientid=ph.clientid and rs.sharerid=ph.sharerid

;

quit;

data phprd(keep=phone clientid sharerid phone_flag date) ;

set phone ;

phone_flag=1;

date=startdt;

output;

phone_flag=0;

date=enddt;

output;

run;

proc sort data=phprd;

by phone clientid sharerid date;

run;

data payprd(keep = phone clientid sharerid pay_flag date) ;

set pays;

pay_flag=1;

date=paystart;

output;

pay_flag=0;

date=payend;

output;

run;

proc sort data=payprd;

by phone clientid sharerid date;

run;

data timeline ;

set payprd phprd;

format date ddmmyy10. clientid sharerid 12.;

by phone clientid sharerid date;

run;

cheers

Steve

art297 · Posted 12-12-2012 08:08 PM

Does my original jackhammer approach blow out your system's memory as well? It shouldn't as, if it will work for one record, it should work for any number of records as well.

View solution in original post

art297 · Posted 12-12-2012 06:48 PM

How many records do you have in each of the two files?

SteveNZ · Posted 12-12-2012 07:34 PM

It varies but can be over 4000 plus the periods can go right back to the 1990's which causing huge file bloat.

art297 · Posted 12-12-2012 08:08 PM

Does my original jackhammer approach blow out your system's memory as well? It shouldn't as, if it will work for one record, it should work for any number of records as well.

SteveNZ · Posted 12-12-2012 09:21 PM

Hiya, never thought of that and yes you're quite correct. Used the Jack Hammer approach on my live data and it worked a charm. Thank you very much again.

cheers

Steve

art297 · Posted 12-12-2012 10:36 PM

That must be why they invented JackHammers! Not applicable for all tasks but, sometimes, definitely the right tool for the job.

Astounding · Posted 12-13-2012 09:40 AM

Steve,

A couple of things to consider as well ...

Memory is cheap, at least where computers are concerned. Buy more?

The memory requirements can be reduced by breaking up the data into batches. Subsets would be based on ranges of clientid, for example, but make sure the same client IDs appear in both data sets.

Catch up on SAS Innovate 2026