GA everyone,
I have an existing customer status data set that is only updated when a status change change occurs. I want to do status counts by month, year and segment. So the challenge is creating a temp table for missing values for the month date periods that are not in the data table in order to enable a count of status by month and year.
Important assumption:
- Customer_Status is the same when it is not reported
I should be able to count status for the Status_Date that are missing for Company02
Dates Missing Assumed Status
05/01/2005 - 05/31/2009 Active
07/01/2009 - 09/31/2009 CYRisk
11/01/2009 - 07/31/2013 Active
09/01/2013 - Today() CYRisk
Sample Data
data sample_data;
input @1 Account_id 2.
@3 Account_name $10.
@13Segment $14.
@28 Customer_Status 2.
@30Customer_Status_Name $7.
+1 Status_Date mmddyy10.;
format Status_Date mmddyy10.;
cards;
02 Company01 HighNewtWorth 04 CPLost 12/05/2006
03 Company02 MidSizeMarket 01 Active 04/25/2005
03 Company02 MidSizeMarket 08 CYRisk 06/25/2009
03 Company02 MidSizeMarket 01 Active 10/05/2009
03 Company02 MidSizeMarket 08 CYRisk 08/09/2013
04 Company03 HighNewtWorth 01 Active 01/26/2005
04 Company03 HighNewtWorth 04 CPLost 12/02/2016
05 Company04 HighNewtWorth 01 Active 09/13/2004
06 Company05 HighNewtWorth 01 Active 10/01/2004
06 Company05 HighNewtWorth 02 Inactv 08/18/2011
07 Company06 HighNewtWorth 01 Active 09/06/2006
08 Company07 HighNewtWorth 01 Active 08/01/2005
09 Company08 HighNewtWorth 01 Active 11/01/2005
09 Company08 HighNewtWorth 02 Inactv 11/23/2010
09 Company08 HighNewtWorth 06 Termin 05/07/2011
10 Company09 HighNewtWorth 01 Active 07/31/2007
10 Company09 HighNewtWorth 11 Infreq 10/27/2011
10 Company09 HighNewtWorth 01 Active 11/02/2012
10 Company09 HighNewtWorth 11 Infreq 11/14/2018
11 Company10 HighNewtWorth 01 Active 01/04/2005
11 Company10 HighNewtWorth 04 CPLost 06/05/2015
12 Company11 HighNewtWorth 01 Active 10/26/2004
13 Company12 HighNewtWorth 01 Active 01/14/2005
13 Company12 HighNewtWorth 02 Inactv 07/21/2011
14 Company13 HighNewtWorth 01 Active 01/04/2005
14 Company13 HighNewtWorth 02 Inactv 08/02/2012
15 Company14 HighNewtWorth 01 Active 09/01/2005
15 Company14 HighNewtWorth 02 Inactv 09/19/2008
16 Company15 HighNewtWorth 01 Active 09/28/2006
17 Company15 HighNewtWorth 06 Termin 05/19/2005
18 Company16 HighNewtWorth 01 Active 02/11/2005
19 Company17 HighNewtWorth 01 Active 10/26/2004
20 Company18 HighNewtWorth 01 Active 08/24/2005
;
run;
quit;
Thanks
You would better calculate the status duration (number of days) and use that as a FREQ variable in summary procedures. Get the duration by lookahead:
data want;
set sample_data; by account_id;
if not last then
set sample_data(firstobs=2 keep=status_date rename=status_date=next_date) end=last;
if last.account_id then status_duration = 1; /* Default duration */
else status_duration = next_date - status_date;
drop next_date;
run;
Thank you for the quick turnaround, this works as intended but given how I need to use the data, I really need to retain the values and assumed status for each identified missing date that is created as I will be doing multiple date range analysis.
In data set
03 Company02 MidSizeMarket 01 Active 04/25/2005
03 Company02 MidSizeMarket 08 CYRisk 06/25/2009
03 Company02 MidSizeMarket 01 Active 10/05/2009
03 Company02 MidSizeMarket 08 CYRisk 08/09/2013
Required output
-keeps the original value and produces the missing data, just listing a few records that would be created between observations for Company02
03 Company02 MidSizeMarket 01 Active 04/25/2005 - original
03 Company02 MidSizeMarket 01 Active 05/01/2005 - created
03 Company02 MidSizeMarket 01 Active 06/01/2005 - created
03 Company02 MidSizeMarket 01 Active 07/01/2005 - created
until we reach month b4 next status date 05/01/2009 - created
03 Company02 MidSizeMarket 08 CYRisk 06/25/2009 - original
03 Company02 MidSizeMarket 08 CYRisk 07/01/2009 - created
03 Company02 MidSizeMarket 08 CYRisk 08/01/2009 - created
03 Company02 MidSizeMarket 08 CYRisk 09/01/2009 - created
03 Company02 MidSizeMarket 01 Active 10/05/2009 - original
03 Company02 MidSizeMarket 01 Active 11/01/2009 - created
03 Company02 MidSizeMarket 01 Active 12/01/2009 - created
03 Company02 MidSizeMarket 01 Active 12/01/2009 - created
03 Company02 MidSizeMarket 01 Active 01/01/2010 - created
until we reach month b4 next status date 07/01/2013 - created
03 Company02 MidSizeMarket 08 CYRisk 08/09/2013 - original
03 Company02 MidSizeMarket 08 CYRisk 09/09/2013 - created
---------------------until we reach most recent mm/01/yyyy
I'm standardizing the output to day 01 for new date, hope this makes sense
This will build you a table that I think you can use to generate whatever transactions you need to complete your list of accounts and months.
Tom
/* Get the Accounts */
proc sql noprint;
create table Check_Accounts as
select distinct Account_id as Check_Account from Sample_Data;
quit;
/* Get the earliest and latest date */
proc sql noprint;
select min(Status_Date), max(Status_Date) into :Min_Date, :Max_Date
from Sample_Data;
/* Get a list of all the needed dates */
data Check_Months(drop=_:);
_Number_Of_Months = intck('month', &Min_Date., &Max_Date.);
do _i = 0 to _Number_Of_Months;
Check_Month = intnx("month", &Min_Date., _i, 'beginning');
format Check_Month date.;
output;
end;
run;
/* Get a cross-combination of accounts and dates */
proc sql noprint;
create table Check_Table as select Check_Account, Check_Month, 'Missing' as Check_Status from Check_Accounts cross join Check_Months;
quit;
/* Get the Account and Month */
data Existing_Company_Month;
set Sample_Data;
Month_Active = intnx("month", Status_Date, 0, 'beginning');
format Month_Active date.;
keep Account_id Month_Active;
run;
/* Sort, and then keep the first (get unique) */
proc sort data=Existing_Company_Month;
by Account_id Month_Active;
run;
data Existing_Company_Month;
set Existing_Company_Month;
by Account_id Month_Active;
Current_Status = "Active";
if first.Month_Active then
output;
run;
/* Join the two tables to figure out which months we have and which ones we need to add data for */
proc sql noprint;
create table Combined_Tables as select m.*, e.Current_Status
from Check_Table m left outer join Existing_Company_Month e
on(m.Check_Account = e.Account_id and m.Check_Month = e.Month_Active)
order by m.Check_Account, m.Check_Month;
quit;
OK. You asked for it:
data want;
set sample_data; by account_id;
if not last then
set sample_data(firstobs=2 keep=status_date rename=status_date=next_date) end=last;
if last.account_id then output;
else do status_date = status_date to next_date - 1; output; end;
drop next_date;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.