Hello all,
I have a portfolio table that include general info about client companies by month. I am trying to sum employees grouped by product, within new, cancelled, and retained business. The table looks similar to the below dataset:
| Date | Company | product | Employees | 
| 201712 | Alpha | X | 60 | 
| 201712 | Alpha | Y | 60 | 
| 201712 | Bravo | X | 100 | 
| 201801 | Alpha | X | 62 | 
| 201801 | Alpha | Y | 62 | 
| 201801 | Bravo | X | 90 | 
| 201801 | Charlie | X | 200 | 
| 201802 | Alpha | X | 65 | 
| 201802 | Alpha | Y | 65 | 
| 201802 | Charlie | Y | 205 | 
| 201802 | Delta | X | 110 | 
My end goal is to create reports based on the sum of employees by product, for retained, new, and cancelled business. For example, compared to December of 2017, January of 2018 has one new company (Charlie) with 200 employees in product X. Company Alpha gained 2 employees in both product X and Y. And company Bravo lost 10 employees with product X.
Is there a way to identify new, cancelled, and retained business by product compared to the starting month? If anyone is aware of any documentation on what the best practices are to do this analysis, id be more than willing to read any suggestions.
Thank you for your help!
Something like below could prepare your data for reporting.
data have;
  infile datalines dlm=' ' truncover;
  input Date:yymmn6. Company $ product $ Employees;
  format date date9.;
  datalines;
201712 Alpha X 60
201712 Alpha Y 60
201712 Bravo X 100
201801 Alpha X 62
201801 Alpha Y 62
201801 Bravo X 90
201801 Charlie X 200
201802 Alpha X 65
201802 Alpha Y 65
201802 Charlie Y 205
201802 Delta X 110
;
run;
proc sql noprint;
  select put(max(date),date9. -l) into :max_date
  from have;
quit;
proc sort data=have out=inter;
  by Company Date;
run;
proc format;
  value compStatus(default=9)
  1='Retained'
  2='New'
  3='Cancelled'
  ;
quit;
data want(drop=_:);
  merge 
    inter(in=exist2017 keep=Company date rename=(date=_date2017) where=(_date2017='01Dec2017'd)) 
    inter(in=exist2018 keep=Company date rename=(date=_date2018) where=(_date2018="&max_date"d)) 
    inter
    ;
  by Company;
  format status_flg compStatus.;
  if exist2017 and exist2018 then status_flg=1;
  else if not exist2017 and exist2018 then status_flg=2;
  else if exist2017 and not exist2018 then status_flg=3;
run;
So what is your definition of a new, cancelled or retained business based on the data you have provided?
it is all comparing to the end of 2017 book of business. So new would be considered any company that is not in December of 17 but is in any other month. Cancelled is any company that was in December of 17 but not in any month since. Retained would be any company that is in December of 2017 and the most current month.
Dec 2017 doesn't appear in your posted data. Can you provide a more complete set of data to test on that includes your reference month? It is best if you can provide it in a DATA step using a CARDS or DATALINES statement.
Please provide sample data which cover all your conditions. What you've posted so far doesn't have a single record for 201712.
Please post such data in the form of a SAS data step and ideally also show us the desired result. It's not clear to me where you would flag a company as "cancelled" as for this company there isn't a record in the source to flag.
Here the data step creating the sample data you've posted so far for you to amend.
data have;
  infile datalines dlm=' ' truncover;
  input Date:yymmn6. Company $ product $ Employees;
  format date date9.;
  datalines;
201701 Alpha X 60
201701 Alpha Y 60
201701 Bravo X 100
201801 Alpha X 62
201801 Alpha Y 62
201801 Bravo X 90
201801 Charlie X 200
201802 Alpha X 65
201802 Alpha Y 65
201802 Charlie Y 205
201802 Delta X 110
;
run;
Hi Patrick,
I apologize for the confusion-slight typo above. I meant to have the starting month in the dataset be 201712. I have corrected this mistake! Under the assumption we are comparing everything to the first month in the dataset (201712), I am trying to make 3 summary reports:
-New business (Any company that was not listed in Dec17 but is listed in the most recent month)
-Cancelled business (Any company that is listed in Dec17 but is no longer listed in the most recent month)
-Retained business (Any company that is listed in both Dec17 and the most recent month)
I am looking to end up with something similar to the report created with the below code, but if possible, I would like it for new, cancelled, and retained business that match the criteria above. I am trying to see which products we are selling, losing, and retaining over the course of a year.
data have;
  infile datalines dlm=' ' truncover;
  input Date:yymmn6. Company $ product $ Employees;
  format date date9.;
  datalines;
201712 Alpha X 60
201712 Alpha Y 60
201712 Bravo X 100
201801 Alpha X 62
201801 Alpha Y 62
201801 Bravo X 90
201801 Charlie X 200
201802 Alpha X 65
201802 Alpha Y 65
201802 Charlie Y 205
201802 Delta X 110
;
run;
proc sort data=have;
by product;
run;
proc report data=have;
column Date Employees;
define date / group;
rbreak after / summarize;
by product;
run;
Does that make more sense? If not, please let me know and I can try to better explain. I appreciate your help with this!
Thank you
Something like below could prepare your data for reporting.
data have;
  infile datalines dlm=' ' truncover;
  input Date:yymmn6. Company $ product $ Employees;
  format date date9.;
  datalines;
201712 Alpha X 60
201712 Alpha Y 60
201712 Bravo X 100
201801 Alpha X 62
201801 Alpha Y 62
201801 Bravo X 90
201801 Charlie X 200
201802 Alpha X 65
201802 Alpha Y 65
201802 Charlie Y 205
201802 Delta X 110
;
run;
proc sql noprint;
  select put(max(date),date9. -l) into :max_date
  from have;
quit;
proc sort data=have out=inter;
  by Company Date;
run;
proc format;
  value compStatus(default=9)
  1='Retained'
  2='New'
  3='Cancelled'
  ;
quit;
data want(drop=_:);
  merge 
    inter(in=exist2017 keep=Company date rename=(date=_date2017) where=(_date2017='01Dec2017'd)) 
    inter(in=exist2018 keep=Company date rename=(date=_date2018) where=(_date2018="&max_date"d)) 
    inter
    ;
  by Company;
  format status_flg compStatus.;
  if exist2017 and exist2018 then status_flg=1;
  else if not exist2017 and exist2018 then status_flg=2;
  else if exist2017 and not exist2018 then status_flg=3;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
