BookmarkSubscribeRSS Feed
wil1212
Calcite | Level 5


Hello,

I would like assistance with setting up a filter in EG 6.1. I have thousands of records with billing information for 3 years. I would like to filter the data to remove all accounts that did not have 12 consecutive bills in a year at the same location. My problem is I'm not sure how to filter the data based off a condition. The condition: if account had 12 consecutive bills in 2014 at the same location, list all billing records for all 3 years for that account. The billing date is formatted as a date. Any help would be greatly appreciated.

8 REPLIES 8
Steelers_In_DC
Barite | Level 11

This will get you want you want:

data have;

infile cards dsd;

informat date mmddyy10.;

format date mmddyy10.;

input id $ date;

cards;

1,1/31/2013

1,2/28/2013

1,3/31/2013

1,4/30/2013

1,5/31/2013

1,6/30/2013

1,7/31/2013

1,8/31/2013

1,9/30/2013

1,10/31/2013

1,11/30/2013

1,12/31/2013

2,1/31/2013

2,2/28/2013

2,3/31/2013

2,4/30/2013

2,5/31/2013

2,6/30/2013

2,10/31/2013

2,11/30/2013

2,12/31/2013

3,1/31/2013

3,2/28/2013

3,3/31/2013

3,4/30/2013

3,5/31/2013

3,6/30/2013

3,7/31/2013

3,8/31/2013

3,9/30/2013

3,10/31/2013

3,11/30/2013

3,12/31/2013

;

run;

proc sort data=have;by id date;

data prep;

format _ldate mmddyy10.;

set have;

by id;

_ldate=lag(date);

_date_diff = intck('month',_ldate,date);

_count +1;

if first.id or _date_diff > 1 then _count = 1;

run;

data want(drop=_:) check;

do until(last.id);

set prep;

by id;

if _count = 12 then _flag = 1;

end;

do until(last.id);

set prep;

by id;

output check;

if _flag = 1 then output want;

end;

run;

CaseySmith
SAS Employee

Here is how I accomplished what I believe you are asking using the point-and-click Query Builder...

First, I created some simple (hopefully close to representative) data:

data billing;
format billdate date9.;
informat billdate date9.;
infile datalines dsd;
input account billdate;
datalines;
1,01JAN2013
1,01JAN2014
1,02JAN2014
2,01JAN2013
2,03JAN2014
3,04JAN2014
3,05JAN2014
4,06JAN2014
5,07JAN2014
run;


Then, I ran the Query Builder...

-selected the billdate and account variables

-summarized the billdate variable with N (to get the count of the number of bills; notice it is automatically grouped by account)

Looked like this:

select.png

Then, on the Filter tab, I defined two filters...

-where billdate is between 1/1/2014 and 12/31/2014 (since only interested in 2014 bills)

-where N_of_billdate > 1 (you would use > 12, but for my sample data only looking for more than one bill)

Looked like this:

filter.png

Then I ran the Query Builder and got the accounts with more than one bill in 2014:

result.png

Of course, you'd have to tweak it for your data.


Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

wil1212
Calcite | Level 5
Thanks so much for replying! But my data looks more like the ex below. I want to exclude accounts 2 because it does not have 12 consecutive bills for 2013. I also want to exclude account 3 because it was not at the same address in 2013. In the end, I would like to select all the records for account 1 because it was at the same address and had 12 bills in 2013. I should show all 17 records for account 1.
AccountBill_DateAddress
1Nov-20121 addr
1Dec-20121 addr
1Jan-20131 addr
1Feb-20131 addr
1Mar-20131 addr
1Apr-20131 addr
1May-20131 addr
1Jun-20131 addr
1Jul-20131 addr
1Aug-20131 addr
1Sep-20131 addr
1Oct-20131 addr
1Nov-20131 addr
1Dec-20131 addr
1Jan-20141 addr
1Feb-20141 addr
1Mar-20141 addr
2Sep-20132 addr
2Oct-20132 addr
2Nov-20132 addr
2Dec-20132 addr
2Jan-20142 addr
2Feb-20142 addr
2Mar-20142 addr
3Jan-20123 addr
3Feb-20123 addr
3Mar-20123 addr
3Apr-20123 addr
3May-20123 addr
3Jun-20123 addr
3Jul-20123 addr
3Aug-20123 addr
3Sep-20123 addr
3Oct-20123 addr
3Nov-20123 addr
3Dec-20123 addr
3Jan-20133 addr
3Feb-20133 addr
3Mar-20133 addr
3Apr-20133 addr
3May-20133 addr
3Jun-20133 addr
3Jul-201310 addr
3Aug-201310 addr
3Sep-201310 addr
3Oct-201310 addr
3Nov-201310 addr
3Dec-201310 addr
3Jan-201410 addr
3Feb-201410 addr
3Mar-201410 addr
Steelers_In_DC
Barite | Level 11

account 3 was in the same address for 12 consecutive months, you are contradicting your original question.  If you are only concerned with 2013 then you should filter the population before you do anything else:

data have;

infile cards dsd;

informat bill_date mmddyy10.;

format bill_date mmddyy10.;

input Account $ Bill_Date Address $;

cards;

1,11/1/2012,1 addr

1,12/1/2012,1 addr

1,1/1/2013,1 addr

1,2/1/2013,1 addr

1,3/1/2013,1 addr

1,4/1/2013,1 addr

1,5/1/2013,1 addr

1,6/1/2013,1 addr

1,7/1/2013,1 addr

1,8/1/2013,1 addr

1,9/1/2013,1 addr

1,10/1/2013,1 addr

1,11/1/2013,1 addr

1,12/1/2013,1 addr

1,1/1/2014,1 addr

1,2/1/2014,1 addr

1,3/1/2014,1 addr

2,9/1/2013,2 addr

2,10/1/2013,2 addr

2,11/1/2013,2 addr

2,12/1/2013,2 addr

2,1/1/2014,2 addr

2,2/1/2014,2 addr

2,3/1/2014,2 addr

3,1/1/2012,3 addr

3,2/1/2012,3 addr

3,3/1/2012,3 addr

3,4/1/2012,3 addr

3,5/1/2012,3 addr

3,6/1/2012,3 addr

3,7/1/2012,3 addr

3,8/1/2012,3 addr

3,9/1/2012,3 addr

3,10/1/2012,3 addr

3,11/1/2012,3 addr

3,12/1/2012,3 addr

3,1/1/2013,3 addr

3,2/1/2013,3 addr

3,3/1/2013,3 addr

3,4/1/2013,3 addr

3,5/1/2013,3 addr

3,6/1/2013,3 addr

3,7/1/2013,10 addr

3,8/1/2013,10 addr

3,9/1/2013,10 addr

3,10/1/2013,10 addr

3,11/1/2013,10 addr

3,12/1/2013,10 addr

3,1/1/2014,10 addr

3,2/1/2014,10 addr

3,3/1/2014,10 addr

;

run;

proc sort data=have out=start(where=(year(bill_date)=2013));by account bill_date;

data prep;

format _ldate mmddyy10.;

set start;

by account;

_ldate=lag(bill_date);

_ladd = lag(address);

_date_diff = intck('month',_ldate,bill_date);

_count + 1;

if first.account or _date_diff > 1 then _count = 1;

if not first.account then do;

    if address ne _ladd then _count = 1;

end;

run;

data want(drop=_:) check;

do until(last.account);

set prep;

by account;

if _count = 12 then _flag = 1;

end;

do until(last.account);

set prep;

by account;

output check;

if _flag = 1 then output want;

end;

run;

CaseySmith
SAS Employee

Do you wish to use code or point-and-click Query Builder?

Here is steps using Query Builder (point-and-click) with your sample data...

Create data set from your data:

data bills;
format Bill_Date MONYY7.;
informat Bill_Date MONYY7.;
infile datalines;
input Account Bill_Date Address;
datalines;
1 Nov-2012 1 addr
1 Dec-2012 1 addr
...
3 Feb-2014 10 addr
3 Mar-2014 10 addr
run;

Run the Query Builder with very similar to the steps I supplied earlier...

-Select all three variables

-Select N function in the Summary column for the Bill_Date variable

(Notice: It automatically groups by account and address.)

Should look similar to:

select.png

On the Filter tab, specify two filters:

-Bill_Date between 1/1/2013 and 12/31/2013

-N_of_Bill_Date >= 12

It should look like:

filter.png

Run the Query Builder and the result should return the accounts that have 12 consecutive bills at the same address (one record in this case):

result.png

Now you simply need to join this table with your original table to get the bills for that account:

-start a new Query Builder on the original table (named BILLS in my case)

-add the table that is the result of the first query (QUERY_FOR_BILLS)

(The Query Builder will automatically detect a join, but confirm joining by Account by pressing the Join Tables button.)

newquery2.png

Run the query and you should end up with this final result (the 17 records from account 1):

finalresult.png

And your process flow should look something like this:

processflow.png

Casey


Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

wil1212
Calcite | Level 5

Sorry for the confusion. I'm new to SAS and haven't been to training yet. Still learning the software. I'm not familiar with the code either...training soon!

But Casey, that was the help I needed. I knew how to filter the data for 2013, but I did not think to go back and join the tables to get the additional records. Thanks SO much.

Steelers_In_DC
Barite | Level 11

I realized I didn't account for the same location, here's an updated code:

data have;

infile cards dsd;

informat date mmddyy10.;

format date mmddyy10.;

input id $ date location $;

cards;

1,1/31/2013,m

1,2/28/2013,m

1,3/31/2013,m

1,4/30/2013,m

1,5/31/2013,m

1,6/30/2013,m

1,7/31/2013,m

1,8/31/2013,m

1,9/30/2013,m

1,10/31/2013,m

1,11/30/2013,m

1,12/31/2013,m

2,1/31/2013,m

2,2/28/2013,m

2,3/31/2013,m

2,4/30/2013,m

2,5/31/2013,m

2,6/30/2013,m

2,10/31/2013,m

2,11/30/2013,m

2,12/31/2013,m

3,1/31/2013,n

3,2/28/2013,n

3,3/31/2013,n

3,4/30/2013,n

3,5/31/2013,n

3,6/30/2013,X

3,7/31/2013,n

3,8/31/2013,n

3,9/30/2013,n

3,10/31/2013,n

3,11/30/2013,n

3,12/31/2013,n

;

run;

proc sql;

create table prep1 as

select *,max(location) as max_location

from have

group by id

having location = max_location

order by id,date;

data prep;

format _ldate mmddyy10.;

set prep1;

by id;

_ldate=lag(date);

_date_diff = intck('month',_ldate,date);

_count +1;

if first.id or _date_diff > 1 then _count = 1;

run;

data want(drop=_:) check;

do until(last.id);

set prep;

by id;

if _count = 12 then _flag = 1;

end;

do until(last.id);

set prep;

by id;

output check;

if _flag = 1 then output want;

end;

run;

Ksharp
Super User

Is there some missing MONTH ?  and Every month have the same day ?


data have;
infile cards dsd;
informat bill_date mmddyy10.;
format bill_date mmddyy10.;
input Account $ Bill_Date Address $;
cards;
1,11/1/2012,1 addr
1,12/1/2012,1 addr
1,1/1/2013,1 addr
1,2/1/2013,1 addr
1,3/1/2013,1 addr
1,4/1/2013,1 addr
1,5/1/2013,1 addr
1,6/1/2013,1 addr
1,7/1/2013,1 addr
1,8/1/2013,1 addr
1,9/1/2013,1 addr
1,10/1/2013,1 addr
1,11/1/2013,1 addr
1,12/1/2013,1 addr
1,1/1/2014,1 addr
1,2/1/2014,1 addr
1,3/1/2014,1 addr
2,9/1/2013,2 addr
2,10/1/2013,2 addr
2,11/1/2013,2 addr
2,12/1/2013,2 addr
2,1/1/2014,2 addr
2,2/1/2014,2 addr
2,3/1/2014,2 addr
3,1/1/2012,3 addr
3,2/1/2012,3 addr
3,3/1/2012,3 addr
3,4/1/2012,3 addr
3,5/1/2012,3 addr
3,6/1/2012,3 addr
3,7/1/2012,3 addr
3,8/1/2012,3 addr
3,9/1/2012,3 addr
3,10/1/2012,3 addr
3,11/1/2012,3 addr
3,12/1/2012,3 addr
3,1/1/2013,3 addr
3,2/1/2013,3 addr
3,3/1/2013,3 addr
3,4/1/2013,3 addr
3,5/1/2013,3 addr
3,6/1/2013,3 addr
3,7/1/2013,10 addr
3,8/1/2013,10 addr
3,9/1/2013,10 addr
3,10/1/2013,10 addr
3,11/1/2013,10 addr
3,12/1/2013,10 addr
3,1/1/2014,10 addr
3,2/1/2014,10 addr
3,3/1/2014,10 addr
;
run;
proc sql;
create table want as
 select *
  from have
   group by Account
    having count(*) ge 12 and count(distinct Address)=1 and intck('month',min(Bill_Date),max(Bill_Date)) ge 11
     order by Bill_Date;
quit;

Xia Keshan

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 2365 views
  • 0 likes
  • 4 in conversation