BookmarkSubscribeRSS Feed
Peter_Thiem
Calcite | Level 5

Hi, I'm a newbie here, please be kind with my first post.

 

I need help to identify which customer has 1 or more product switch and I have attached a sample data as well as 2 output tables I need to produce. Some of the data from the spreadsheet attachment is copied below.

 

Input Data
IDPROD_CODEProd_First_use_mthProd_Last_use_mth
2ABC061-Mar-091-May-11
2ABC081-Jun-111-Apr-17
32ABC071-Jan-091-Jan-12
32ABC081-Mar-121-Apr-17
48BMX021-Apr-091-May-09
48ABC061-Jan-091-Apr-17
90ABC061-Jan-091-Jul-12
90ABC071-Aug-121-Apr-17
90DCA021-Mar-091-Mar-13
90QAM011-Jan-091-Aug-16
90QAM021-Dec-091-Apr-17

 

What I'm after for the first output is finding customers who switch from using one product to another within plus or minus 4 months like in the case of customer 48. The input data contain the customer ID and the product first and last use month-year

 

Output Data 1
IDFIRST PROD SWITCH FROMFIRST PROD SWITCH FROM DATEFIRST PROD SWITCH TOFIRST PROD SWITCH TO DATE
2ABC061-May-11ABC081-Jun-11
32ABC071-Jan-12ABC081-Mar-12
48BMX021-May-09ABC061-Jan-09
90QAM011-Jan-09ABC061-Feb-09
90ABC061-Jul-12ABC071-Aug-12


The second output is pivoting the data to identify all customers who have >= 2 products spend within the last 6 months (DEC16-MAY17). Some customers do have more than 4 products but I just put down 4 here for illustration. 

 

Output Data 2
IDCURR PROD #1CURR PROD 1ST SPEND #1CURR PROD LAST SPEND #1CURR PROD  #2CURR PROD 1ST SPEND #2CURR PROD LAST SPEND #2CURR PROD #3CURR PROD 1ST SPEND #3CURR PROD LAST SPEND #3CURR PROD  #4CURR PROD 1ST SPEND #4CURR PROD LAST SPEND #4
90QAM021-Dec-091-Apr-17ABC071-Aug-121-Apr-17      

 

Hope I have been clear in my description, let me know if I may have missed anything.

 

Thank you,

 

Peter

9 REPLIES 9
art297
Opal | Level 21

Does your window have to be 4 months or would something like 120 days suffice? Both are doable, but the latter would be easier to code.

 

Also, is it possible for a switch to occur while the first product is still active (e.g., product 1 has a start date of 25Feb17 and is still active today, and product 2 has a start date of 13May17 and is still active today)?

 

Art, CEO, AnalystFinder.com

 

art297
Opal | Level 21

I have another question for you. In your example have/want data you have the following two records for ID 48:

48	BMX02	01-Apr-09	01-May-09
48	ABC06	01-Jan-09	01-Apr-17

I would think that ABC06 would be the first product (started on 1Jan09) and the switch would be to BMX02 (started on 1Apr09), but in your want data you show the switch the other way around.

 

If it should be the way I described it, then the following might (I haven't fully tested it yet) do what you want. The only thing you might have to change is the overlap period. I used 120 days, but it would be just as easy to use the intnx function to make it 4 months:

/*Remove those IDs that have no switches*/
proc sql noprint;
  create table need as
    select *,count(distinct PROD_CODE) as prod_count
      from have
        group by ID
          having calculated prod_count gt 1
  ;
quit;

/*Correct for missing data and expand file*/
data need;
  set need;
  format active date9.;
  if missing(Prod_Last_use_mth) then Prod_Last_use_mth=today();
  do active=Prod_First_use_mth-120 to Prod_Last_use_mth+120;
    output;
  end;
run;

/*Delete overlaping records for same product*/
proc sort data=need nodupkey;
  by id active PROD_CODE;
run;

/*Identify and output first switch*/
data want(keep=id first:);
  format id 8.;
  format first_prod_switch_from $8.;
  format first_prod_switch_from_dt date9.;
  format first_prod_switch_to $8.;
  format first_prod_switch_to_dt date9.;
  set need;
  by id active PROD_CODE;
  retain skip;
  format first_prod_switch_from_dt first_prod_switch_to_dt date9.;
  if first.id then skip=0;
  first_prod_switch_from=lag(PROD_CODE);
  first_prod_switch_from_dt=lag(Prod_Last_use_mth);
  if skip eq 0 and not first.active then do;
    first_prod_switch_to=PROD_CODE;
    first_prod_switch_to_dt=Prod_First_use_mth;
    output;
    skip=1;
  end;
run;

Art, CEO, AnalystFinder.com

 

 

Peter_Thiem
Calcite | Level 5

First of all, thank you Art for your response to this problem, to answer your questions regarding customer 48 yes you are right, I have not sort the data and 120 days you suggested is fine too.

 

However when I attempted to apply the correct for missing data and expand file data step, the data set have blown out to an unimaginable size which even SAS cannot open the table for me to see what is happening with the intermediate step, and the final data step only have 1 switch per customer and not the multiple product switches for customers who have many products holding over the years and have been actively switching e.g. customer 90, I hope I have enough examples in the spreadsheet attachment to show what I mean. 

 

Best Regards,

 

Peter

art297
Opal | Level 21

Your original request was simply that you wanted to identify customers that had one or more switches. Are you now saying that you want to identify all switches?

 

If the process creates too big of a file, the easiest solution would be do run the process multiple times, using a where condition to limit the number of ids included in each run. How many IDs do you have in your real data?

 

Art, CEO, AnalystFinder.com

 

Peter_Thiem
Calcite | Level 5

Hi Art, the aim is to find how many time a customer have switches multiple product, sorry I thought I have made this clear in my subject line but I realised maybe it's the output data #1 headers that cause the confusion so yes a customer can have many switches as the data suggested. I have copied the input data and output data in whole below. My input data is already over 15Gb in size and the intermediate step blow out to more than 360Gb which I cannot open. But I will follow your suggestion breaking up this data by ID ranges.

Ksharp
Super User

Assuming I understand what you mean.

 

data have;
infile cards expandtabs truncover;
input ID	PROD_CODE $	(Prod_First_use_mth	Prod_Last_use_mth) (: date9.);
format Prod_First_use_mth	Prod_Last_use_mth date9.;
cards;
2	ABC06	1-Mar-09	1-May-11
2	ABC08	1-Jun-11	1-Apr-17
32	ABC07	1-Jan-09	1-Jan-12
32	ABC08	1-Mar-12	1-Apr-17
48	BMX02	1-Apr-09	1-May-09
48	ABC06	1-Jan-09	1-Apr-17
90	ABC06	1-Jan-09	1-Jul-12
90	ABC07	1-Aug-12	1-Apr-17
90	DCA02	1-Mar-09	1-Mar-13
90	QAM01	1-Jan-09	1-Aug-16
90	QAM02	1-Dec-09	1-Apr-17
;
run;
proc sql;
create table want as
 select a.ID,a.PROD_CODE,a.Prod_Last_use_mth,b.PROD_CODE as to,b.Prod_First_use_mth as to_date
  from have as a,have as b
   where a.id=b.id and a.PROD_CODE ne b.PROD_CODE 
    and b.Prod_First_use_mth between intnx('month',a.Prod_Last_use_mth,-4) and
	    intnx('month',a.Prod_Last_use_mth,4) ;
quit;
Peter_Thiem
Calcite | Level 5

Hello Ksharp,

 

I have done a few checks and this SQL code seems to provide all the right switchers and it runs quite quickly too.

 

Many thanks for taking your time to solve this problem Smiley Very Happy

 

Best Regards,

 

Peter

art297
Opal | Level 21

@Peter_Thiem: Glad to hear that @Ksharp's proposed code does what you want. If actual order isn't relevant than, clearly, a SQL approach (like he suggested) would be preferable.

FWIW, I designed the code I proposed based on your initial problem statement: "I need help to identify which customer has 1 or more product switch"

And you hadn't mentioned that your "real" data was 15gb in size.

One thing you may want to think about, though, are your decision rules for identifying switches. Take a simple example:

data have;
  infile cards dlm=',';
  input ID PROD_CODE $ (Prod_First_use_mth Prod_Last_use_mth) (: date9.);
format Prod_First_use_mth	Prod_Last_use_mth date9.;
cards;
2,ABC01,1-Mar-09,2-Mar-09
2,ABC02,2-Mar-09,3-Mar-09
2,ABC03,3-Mar-09,4-Mar-09
2,ABC04,4-Mar-09,5-Mar-09
2,ABC05,5-Mar-09,6-Mar-09
2,ABC06,6-Mar-09,7-Mar-09
2,ABC07,7-Mar-09,8-Mar-09
2,ABC08,1-Jun-11,1-Apr-17
;
run;

Based on that data set, how many switches do you want the code to identify? Your current decision rules, and @Ksharp's code, identify 42 switches. If that's what you want, you're all set. However, if order has to be considered, more thought and processing has to go into the solution.

 

Art, CEO, AnalystFinder.com

 

 

Peter_Thiem
Calcite | Level 5

Hi Art, sorry for the late reply, order is not considered important in my case.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1023 views
  • 0 likes
  • 3 in conversation