Hello,
I would like assistance with setting up a filter in EG 6.1. I have thousands of records with billing information for 3 years. I would like to filter the data to remove all accounts that did not have 12 consecutive bills in a year at the same location. My problem is I'm not sure how to filter the data based off a condition. The condition: if account had 12 consecutive bills in 2014 at the same location, list all billing records for all 3 years for that account. The billing date is formatted as a date. Any help would be greatly appreciated.
This will get you want you want:
data have;
infile cards dsd;
informat date mmddyy10.;
format date mmddyy10.;
input id $ date;
cards;
1,1/31/2013
1,2/28/2013
1,3/31/2013
1,4/30/2013
1,5/31/2013
1,6/30/2013
1,7/31/2013
1,8/31/2013
1,9/30/2013
1,10/31/2013
1,11/30/2013
1,12/31/2013
2,1/31/2013
2,2/28/2013
2,3/31/2013
2,4/30/2013
2,5/31/2013
2,6/30/2013
2,10/31/2013
2,11/30/2013
2,12/31/2013
3,1/31/2013
3,2/28/2013
3,3/31/2013
3,4/30/2013
3,5/31/2013
3,6/30/2013
3,7/31/2013
3,8/31/2013
3,9/30/2013
3,10/31/2013
3,11/30/2013
3,12/31/2013
;
run;
proc sort data=have;by id date;
data prep;
format _ldate mmddyy10.;
set have;
by id;
_ldate=lag(date);
_date_diff = intck('month',_ldate,date);
_count +1;
if first.id or _date_diff > 1 then _count = 1;
run;
data want(drop=_:) check;
do until(last.id);
set prep;
by id;
if _count = 12 then _flag = 1;
end;
do until(last.id);
set prep;
by id;
output check;
if _flag = 1 then output want;
end;
run;
Here is how I accomplished what I believe you are asking using the point-and-click Query Builder...
First, I created some simple (hopefully close to representative) data:
Then, I ran the Query Builder...
-selected the billdate and account variables
-summarized the billdate variable with N (to get the count of the number of bills; notice it is automatically grouped by account)
Looked like this:
Then, on the Filter tab, I defined two filters...
-where billdate is between 1/1/2014 and 12/31/2014 (since only interested in 2014 bills)
-where N_of_billdate > 1 (you would use > 12, but for my sample data only looking for more than one bill)
Looked like this:
Then I ran the Query Builder and got the accounts with more than one bill in 2014:
Of course, you'd have to tweak it for your data.
Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF
View now: on-demand content for SAS users
Account | Bill_Date | Address |
1 | Nov-2012 | 1 addr |
1 | Dec-2012 | 1 addr |
1 | Jan-2013 | 1 addr |
1 | Feb-2013 | 1 addr |
1 | Mar-2013 | 1 addr |
1 | Apr-2013 | 1 addr |
1 | May-2013 | 1 addr |
1 | Jun-2013 | 1 addr |
1 | Jul-2013 | 1 addr |
1 | Aug-2013 | 1 addr |
1 | Sep-2013 | 1 addr |
1 | Oct-2013 | 1 addr |
1 | Nov-2013 | 1 addr |
1 | Dec-2013 | 1 addr |
1 | Jan-2014 | 1 addr |
1 | Feb-2014 | 1 addr |
1 | Mar-2014 | 1 addr |
2 | Sep-2013 | 2 addr |
2 | Oct-2013 | 2 addr |
2 | Nov-2013 | 2 addr |
2 | Dec-2013 | 2 addr |
2 | Jan-2014 | 2 addr |
2 | Feb-2014 | 2 addr |
2 | Mar-2014 | 2 addr |
3 | Jan-2012 | 3 addr |
3 | Feb-2012 | 3 addr |
3 | Mar-2012 | 3 addr |
3 | Apr-2012 | 3 addr |
3 | May-2012 | 3 addr |
3 | Jun-2012 | 3 addr |
3 | Jul-2012 | 3 addr |
3 | Aug-2012 | 3 addr |
3 | Sep-2012 | 3 addr |
3 | Oct-2012 | 3 addr |
3 | Nov-2012 | 3 addr |
3 | Dec-2012 | 3 addr |
3 | Jan-2013 | 3 addr |
3 | Feb-2013 | 3 addr |
3 | Mar-2013 | 3 addr |
3 | Apr-2013 | 3 addr |
3 | May-2013 | 3 addr |
3 | Jun-2013 | 3 addr |
3 | Jul-2013 | 10 addr |
3 | Aug-2013 | 10 addr |
3 | Sep-2013 | 10 addr |
3 | Oct-2013 | 10 addr |
3 | Nov-2013 | 10 addr |
3 | Dec-2013 | 10 addr |
3 | Jan-2014 | 10 addr |
3 | Feb-2014 | 10 addr |
3 | Mar-2014 | 10 addr |
account 3 was in the same address for 12 consecutive months, you are contradicting your original question. If you are only concerned with 2013 then you should filter the population before you do anything else:
data have;
infile cards dsd;
informat bill_date mmddyy10.;
format bill_date mmddyy10.;
input Account $ Bill_Date Address $;
cards;
1,11/1/2012,1 addr
1,12/1/2012,1 addr
1,1/1/2013,1 addr
1,2/1/2013,1 addr
1,3/1/2013,1 addr
1,4/1/2013,1 addr
1,5/1/2013,1 addr
1,6/1/2013,1 addr
1,7/1/2013,1 addr
1,8/1/2013,1 addr
1,9/1/2013,1 addr
1,10/1/2013,1 addr
1,11/1/2013,1 addr
1,12/1/2013,1 addr
1,1/1/2014,1 addr
1,2/1/2014,1 addr
1,3/1/2014,1 addr
2,9/1/2013,2 addr
2,10/1/2013,2 addr
2,11/1/2013,2 addr
2,12/1/2013,2 addr
2,1/1/2014,2 addr
2,2/1/2014,2 addr
2,3/1/2014,2 addr
3,1/1/2012,3 addr
3,2/1/2012,3 addr
3,3/1/2012,3 addr
3,4/1/2012,3 addr
3,5/1/2012,3 addr
3,6/1/2012,3 addr
3,7/1/2012,3 addr
3,8/1/2012,3 addr
3,9/1/2012,3 addr
3,10/1/2012,3 addr
3,11/1/2012,3 addr
3,12/1/2012,3 addr
3,1/1/2013,3 addr
3,2/1/2013,3 addr
3,3/1/2013,3 addr
3,4/1/2013,3 addr
3,5/1/2013,3 addr
3,6/1/2013,3 addr
3,7/1/2013,10 addr
3,8/1/2013,10 addr
3,9/1/2013,10 addr
3,10/1/2013,10 addr
3,11/1/2013,10 addr
3,12/1/2013,10 addr
3,1/1/2014,10 addr
3,2/1/2014,10 addr
3,3/1/2014,10 addr
;
run;
proc sort data=have out=start(where=(year(bill_date)=2013));by account bill_date;
data prep;
format _ldate mmddyy10.;
set start;
by account;
_ldate=lag(bill_date);
_ladd = lag(address);
_date_diff = intck('month',_ldate,bill_date);
_count + 1;
if first.account or _date_diff > 1 then _count = 1;
if not first.account then do;
if address ne _ladd then _count = 1;
end;
run;
data want(drop=_:) check;
do until(last.account);
set prep;
by account;
if _count = 12 then _flag = 1;
end;
do until(last.account);
set prep;
by account;
output check;
if _flag = 1 then output want;
end;
run;
Do you wish to use code or point-and-click Query Builder?
Here is steps using Query Builder (point-and-click) with your sample data...
Create data set from your data:
Run the Query Builder with very similar to the steps I supplied earlier...
-Select all three variables
-Select N function in the Summary column for the Bill_Date variable
(Notice: It automatically groups by account and address.)
Should look similar to:
On the Filter tab, specify two filters:
-Bill_Date between 1/1/2013 and 12/31/2013
-N_of_Bill_Date >= 12
It should look like:
Run the Query Builder and the result should return the accounts that have 12 consecutive bills at the same address (one record in this case):
Now you simply need to join this table with your original table to get the bills for that account:
-start a new Query Builder on the original table (named BILLS in my case)
-add the table that is the result of the first query (QUERY_FOR_BILLS)
(The Query Builder will automatically detect a join, but confirm joining by Account by pressing the Join Tables button.)
Run the query and you should end up with this final result (the 17 records from account 1):
And your process flow should look something like this:
Casey
Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF
View now: on-demand content for SAS users
Sorry for the confusion. I'm new to SAS and haven't been to training yet. Still learning the software. I'm not familiar with the code either...training soon!
But Casey, that was the help I needed. I knew how to filter the data for 2013, but I did not think to go back and join the tables to get the additional records. Thanks SO much.
I realized I didn't account for the same location, here's an updated code:
data have;
infile cards dsd;
informat date mmddyy10.;
format date mmddyy10.;
input id $ date location $;
cards;
1,1/31/2013,m
1,2/28/2013,m
1,3/31/2013,m
1,4/30/2013,m
1,5/31/2013,m
1,6/30/2013,m
1,7/31/2013,m
1,8/31/2013,m
1,9/30/2013,m
1,10/31/2013,m
1,11/30/2013,m
1,12/31/2013,m
2,1/31/2013,m
2,2/28/2013,m
2,3/31/2013,m
2,4/30/2013,m
2,5/31/2013,m
2,6/30/2013,m
2,10/31/2013,m
2,11/30/2013,m
2,12/31/2013,m
3,1/31/2013,n
3,2/28/2013,n
3,3/31/2013,n
3,4/30/2013,n
3,5/31/2013,n
3,6/30/2013,X
3,7/31/2013,n
3,8/31/2013,n
3,9/30/2013,n
3,10/31/2013,n
3,11/30/2013,n
3,12/31/2013,n
;
run;
proc sql;
create table prep1 as
select *,max(location) as max_location
from have
group by id
having location = max_location
order by id,date;
data prep;
format _ldate mmddyy10.;
set prep1;
by id;
_ldate=lag(date);
_date_diff = intck('month',_ldate,date);
_count +1;
if first.id or _date_diff > 1 then _count = 1;
run;
data want(drop=_:) check;
do until(last.id);
set prep;
by id;
if _count = 12 then _flag = 1;
end;
do until(last.id);
set prep;
by id;
output check;
if _flag = 1 then output want;
end;
run;
Is there some missing MONTH ? and Every month have the same day ?
data have; infile cards dsd; informat bill_date mmddyy10.; format bill_date mmddyy10.; input Account $ Bill_Date Address $; cards; 1,11/1/2012,1 addr 1,12/1/2012,1 addr 1,1/1/2013,1 addr 1,2/1/2013,1 addr 1,3/1/2013,1 addr 1,4/1/2013,1 addr 1,5/1/2013,1 addr 1,6/1/2013,1 addr 1,7/1/2013,1 addr 1,8/1/2013,1 addr 1,9/1/2013,1 addr 1,10/1/2013,1 addr 1,11/1/2013,1 addr 1,12/1/2013,1 addr 1,1/1/2014,1 addr 1,2/1/2014,1 addr 1,3/1/2014,1 addr 2,9/1/2013,2 addr 2,10/1/2013,2 addr 2,11/1/2013,2 addr 2,12/1/2013,2 addr 2,1/1/2014,2 addr 2,2/1/2014,2 addr 2,3/1/2014,2 addr 3,1/1/2012,3 addr 3,2/1/2012,3 addr 3,3/1/2012,3 addr 3,4/1/2012,3 addr 3,5/1/2012,3 addr 3,6/1/2012,3 addr 3,7/1/2012,3 addr 3,8/1/2012,3 addr 3,9/1/2012,3 addr 3,10/1/2012,3 addr 3,11/1/2012,3 addr 3,12/1/2012,3 addr 3,1/1/2013,3 addr 3,2/1/2013,3 addr 3,3/1/2013,3 addr 3,4/1/2013,3 addr 3,5/1/2013,3 addr 3,6/1/2013,3 addr 3,7/1/2013,10 addr 3,8/1/2013,10 addr 3,9/1/2013,10 addr 3,10/1/2013,10 addr 3,11/1/2013,10 addr 3,12/1/2013,10 addr 3,1/1/2014,10 addr 3,2/1/2014,10 addr 3,3/1/2014,10 addr ; run; proc sql; create table want as select * from have group by Account having count(*) ge 12 and count(distinct Address)=1 and intck('month',min(Bill_Date),max(Bill_Date)) ge 11 order by Bill_Date; quit;
Xia Keshan
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.