BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
twildone
Pyrite | Level 9

Hi....I am trying to determine whether or not a client is a new customer or repeat customer based on whether the customer had purchased at least one (1) product on their current purchsed from a previous purchase. The data set looks like this:

Client_ID Product_Number Date

37543 00653 20130403

37543 00345 20130403

37543 00765 20130403

37543 00653 20130812

37543 00345 20130812

37543 00765 20130812

37543 00831 20130812

37543 00765 20131222

37543 00949 20131222

And I was hoping to have dataset as:

Client_ID Date Type

37543 20130403 New

37543 20130812 Repeat

37543 20131222 Repeat

Any suggestions would bee greatly appreciated. Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
MikeZdeb
Rhodochrosite | Level 12

Hi.  Try this (there is probably a hash approach with a little less code, but I understand this).  I added a second client with slightly different purchases to show how the code behaves when you get a new client (I think it works) ...

data x;

input client product date : yymmdd.;

format date mmddyy10.;

datalines;

37543 00653 20130403

37543 00345 20130403

37543 00765 20130403

37543 00653 20130812

37543 00345 20130812

37543 00765 20130812

37543 00831 20130812

37543 00765 20131222

37543 00949 20131222

97543 00653 20130403

97543 00345 20130812

97543 00765 20130812

97543 00831 20130812

97543 00765 20131222

97543 00949 20131222

;

data y (keep=client date type);

length type $6;

* keep track of products bought within a client;

array bought(30) _temporary_ (30*0);

do until (last.client);

* assume a NEW type;

   type = 'new';

do until (last.date);

  set x;

  by client date;

* check if product within each date is in the array (yes/change type to repeat, no/add to array);

  if product in bought then type = 'repeat';

  else do; j+1; bought(j) = product; end;

end;

output;

end;

* clear array and counter for each new client;

do i = 1 to 30; bought(i) = 0; end;

j=0;

run;

DATA SET Y ...

Obs    type      client          date

1     new        37543    04/03/2013

2     repeat     37543    08/12/2013

3     repeat     37543    12/22/2013

4     new        97543    04/03/2013

5     new        97543    08/12/2013

6     repeat     97543    12/22/2013

View solution in original post

15 REPLIES 15
ndp
Quartz | Level 8 ndp
Quartz | Level 8

proc sort data=... nodupkey;

     by Client_ID Date;

run;

data ...;

     set ...;

     by Client_ID Date;

    length type $6;

     if first.client_id then type="New";

     else type="Repeat";

run;


RW9
Diamond | Level 26 RW9
Diamond | Level 26

You may want to just add an output so the final dataset only has one record per group:

data ...;

     set ...;

     by Client_ID Date;

    length type $6;

     if first.client_id then type="New";

     else type="Repeat";

     if last.client_id then output;

run;

twildone
Pyrite | Level 9

Hi...Thanks for the quick response. I should have mentioned that it is possible that the client may not have purchased a product that they had previous purchased and in this case they would be considered "New" even though they made purchases before. That is why I need to check to see if the client had purchased any of the items in their previous purchases. This is the part that I am having difficulty with. Thanks

ndp
Quartz | Level 8 ndp
Quartz | Level 8

I am not sure i follow your algorithm please provide appropriate example.

MikeZdeb
Rhodochrosite | Level 12

Hi.  Try this (there is probably a hash approach with a little less code, but I understand this).  I added a second client with slightly different purchases to show how the code behaves when you get a new client (I think it works) ...

data x;

input client product date : yymmdd.;

format date mmddyy10.;

datalines;

37543 00653 20130403

37543 00345 20130403

37543 00765 20130403

37543 00653 20130812

37543 00345 20130812

37543 00765 20130812

37543 00831 20130812

37543 00765 20131222

37543 00949 20131222

97543 00653 20130403

97543 00345 20130812

97543 00765 20130812

97543 00831 20130812

97543 00765 20131222

97543 00949 20131222

;

data y (keep=client date type);

length type $6;

* keep track of products bought within a client;

array bought(30) _temporary_ (30*0);

do until (last.client);

* assume a NEW type;

   type = 'new';

do until (last.date);

  set x;

  by client date;

* check if product within each date is in the array (yes/change type to repeat, no/add to array);

  if product in bought then type = 'repeat';

  else do; j+1; bought(j) = product; end;

end;

output;

end;

* clear array and counter for each new client;

do i = 1 to 30; bought(i) = 0; end;

j=0;

run;

DATA SET Y ...

Obs    type      client          date

1     new        37543    04/03/2013

2     repeat     37543    08/12/2013

3     repeat     37543    12/22/2013

4     new        97543    04/03/2013

5     new        97543    08/12/2013

6     repeat     97543    12/22/2013

twildone
Pyrite | Level 9

Thanks MikeZdeb....it works perfect...exactly what I needed....thanks to everyone who made suggestions as well.

MikeZdeb
Rhodochrosite | Level 12

Hi ... another thought that eliminates the array (since you can concatenate numeric variables) ...

data y (keep=client date type);

length type $6 bought $100;

do until (last.client);

  type = 'new';

do until (last.date);

  set x;

  by client date;

* CAT(PRODUCT) gets rid of warning message in the LOG about data type conversion;

  if findw(bought,cat(product)) then type = 'repeat';

  else bought = catx('/',bought,product);

end;

output;

end;

call missing (bought);

run;

twildone
Pyrite | Level 9

Hi MikeZdeb...thanks for taking the time to respond to my request. I am getting an eror message (

 

ERROR: Array subscript out of range at line 187 column 4.)....From the log, it seems there is a problem with " bought(j) = product" statement. Any suggestion why this is happening?

MikeZdeb
Rhodochrosite | Level 12

Hi.   I wrote the code assuming 30 products.  Easy to change, find ...

array bought(30) _temporary_ (30*0);

change to ...

array bought(100) _temporary_ (30*0);

then change ...

do i = 1 to 30; bought(i) = 0; end;

to

do i = 1 to 100; bought(i) = 0; end;

If you get that message again, make the array (and loop) larger.

twildone
Pyrite | Level 9

Hi MikeZdeb.....Thanks for your suggestions. I made the changes as you had suggested and your suggestion with the array and loop worked perfectly. The suggestion without the array after making the corrections as suggested, seemed to give slightly different results when I compared the results from both approaches.

MikeZdeb
Rhodochrosite | Level 12

Hi.  Send me a bit of data that produced the different results and I'll figure it out (msz03@albany.edu).

MikeZdeb
Rhodochrosite | Level 12

ps  If you use the other solution I posted that has no array, you could change this line ...

length type $6 bought $100;

to ...

length type $6 bought $1000;

to accommodate more product numbers in the variable BOUGHT.

Ksharp
Super User

Code: Program

data x;
input client product date : yymmdd.;
format date mmddyy10.;
datalines;
37543 00653 20130403
37543 00345 20130403
37543 00765 20130403
37543 00653 20130812
37543 00345 20130812
37543 00765 20130812
37543 00831 20130812
37543 00765 20131222
37543 00949 20131222
97543 00653 20130403
97543 00345 20130812
97543 00765 20130812
97543 00831 20130812
97543 00765 20131222
97543 00949 20131222
;
run;
data want;
if _n_ eq 1 then do;
if 0 then set x;
declare hash ha();
ha.definekey('product');
ha.definedone();
end;
set x;
by client date ;
retain type 'new ';
if first.client then ha.clear();
if first.date then type='new';
if ha.check()=0 then type='repeat';
ha.replace();
if last.date;
drop product ;
run;
Ksharp
Super User

A problem is if there are missing value in PRODUCT, what you gotta do ?

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 15 replies
  • 2155 views
  • 6 likes
  • 5 in conversation