Hi....I am trying to determine whether or not a client is a new customer or repeat customer based on whether the customer had purchased at least one (1) product on their current purchsed from a previous purchase. The data set looks like this:
Client_ID Product_Number Date
37543 00653 20130403
37543 00345 20130403
37543 00765 20130403
37543 00653 20130812
37543 00345 20130812
37543 00765 20130812
37543 00831 20130812
37543 00765 20131222
37543 00949 20131222
And I was hoping to have dataset as:
Client_ID Date Type
37543 20130403 New
37543 20130812 Repeat
37543 20131222 Repeat
Any suggestions would bee greatly appreciated. Thanks
Hi. Try this (there is probably a hash approach with a little less code, but I understand this). I added a second client with slightly different purchases to show how the code behaves when you get a new client (I think it works) ...
data x;
input client product date : yymmdd.;
format date mmddyy10.;
datalines;
37543 00653 20130403
37543 00345 20130403
37543 00765 20130403
37543 00653 20130812
37543 00345 20130812
37543 00765 20130812
37543 00831 20130812
37543 00765 20131222
37543 00949 20131222
97543 00653 20130403
97543 00345 20130812
97543 00765 20130812
97543 00831 20130812
97543 00765 20131222
97543 00949 20131222
;
data y (keep=client date type);
length type $6;
* keep track of products bought within a client;
array bought(30) _temporary_ (30*0);
do until (last.client);
* assume a NEW type;
type = 'new';
do until (last.date);
set x;
by client date;
* check if product within each date is in the array (yes/change type to repeat, no/add to array);
if product in bought then type = 'repeat';
else do; j+1; bought(j) = product; end;
end;
output;
end;
* clear array and counter for each new client;
do i = 1 to 30; bought(i) = 0; end;
j=0;
run;
DATA SET Y ...
Obs type client date
1 new 37543 04/03/2013
2 repeat 37543 08/12/2013
3 repeat 37543 12/22/2013
4 new 97543 04/03/2013
5 new 97543 08/12/2013
6 repeat 97543 12/22/2013
proc sort data=... nodupkey;
by Client_ID Date;
run;
data ...;
set ...;
by Client_ID Date;
length type $6;
if first.client_id then type="New";
else type="Repeat";
run;
You may want to just add an output so the final dataset only has one record per group:
data ...;
set ...;
by Client_ID Date;
length type $6;
if first.client_id then type="New";
else type="Repeat";
if last.client_id then output;
run;
Hi...Thanks for the quick response. I should have mentioned that it is possible that the client may not have purchased a product that they had previous purchased and in this case they would be considered "New" even though they made purchases before. That is why I need to check to see if the client had purchased any of the items in their previous purchases. This is the part that I am having difficulty with. Thanks
I am not sure i follow your algorithm please provide appropriate example.
Hi. Try this (there is probably a hash approach with a little less code, but I understand this). I added a second client with slightly different purchases to show how the code behaves when you get a new client (I think it works) ...
data x;
input client product date : yymmdd.;
format date mmddyy10.;
datalines;
37543 00653 20130403
37543 00345 20130403
37543 00765 20130403
37543 00653 20130812
37543 00345 20130812
37543 00765 20130812
37543 00831 20130812
37543 00765 20131222
37543 00949 20131222
97543 00653 20130403
97543 00345 20130812
97543 00765 20130812
97543 00831 20130812
97543 00765 20131222
97543 00949 20131222
;
data y (keep=client date type);
length type $6;
* keep track of products bought within a client;
array bought(30) _temporary_ (30*0);
do until (last.client);
* assume a NEW type;
type = 'new';
do until (last.date);
set x;
by client date;
* check if product within each date is in the array (yes/change type to repeat, no/add to array);
if product in bought then type = 'repeat';
else do; j+1; bought(j) = product; end;
end;
output;
end;
* clear array and counter for each new client;
do i = 1 to 30; bought(i) = 0; end;
j=0;
run;
DATA SET Y ...
Obs type client date
1 new 37543 04/03/2013
2 repeat 37543 08/12/2013
3 repeat 37543 12/22/2013
4 new 97543 04/03/2013
5 new 97543 08/12/2013
6 repeat 97543 12/22/2013
Thanks MikeZdeb....it works perfect...exactly what I needed....thanks to everyone who made suggestions as well.
Hi ... another thought that eliminates the array (since you can concatenate numeric variables) ...
data y (keep=client date type);
length type $6 bought $100;
do until (last.client);
type = 'new';
do until (last.date);
set x;
by client date;
* CAT(PRODUCT) gets rid of warning message in the LOG about data type conversion;
if findw(bought,cat(product)) then type = 'repeat';
else bought = catx('/',bought,product);
end;
output;
end;
call missing (bought);
run;
Hi MikeZdeb...thanks for taking the time to respond to my request. I am getting an eror message (
ERROR: Array subscript out of range at line 187 column 4.)....From the log, it seems there is a problem with " bought(j) = product" statement. Any suggestion why this is happening?
Hi. I wrote the code assuming 30 products. Easy to change, find ...
array bought(30) _temporary_ (30*0);
change to ...
array bought(100) _temporary_ (30*0);
then change ...
do i = 1 to 30; bought(i) = 0; end;
to
do i = 1 to 100; bought(i) = 0; end;
If you get that message again, make the array (and loop) larger.
Hi MikeZdeb.....Thanks for your suggestions. I made the changes as you had suggested and your suggestion with the array and loop worked perfectly. The suggestion without the array after making the corrections as suggested, seemed to give slightly different results when I compared the results from both approaches.
Hi. Send me a bit of data that produced the different results and I'll figure it out (msz03@albany.edu).
ps If you use the other solution I posted that has no array, you could change this line ...
length type $6 bought $100;
to ...
length type $6 bought $1000;
to accommodate more product numbers in the variable BOUGHT.
data x;
input client product date : yymmdd.;
format date mmddyy10.;
datalines;
37543 00653 20130403
37543 00345 20130403
37543 00765 20130403
37543 00653 20130812
37543 00345 20130812
37543 00765 20130812
37543 00831 20130812
37543 00765 20131222
37543 00949 20131222
97543 00653 20130403
97543 00345 20130812
97543 00765 20130812
97543 00831 20130812
97543 00765 20131222
97543 00949 20131222
;
run;
data want;
if _n_ eq 1 then do;
if 0 then set x;
declare hash ha();
ha.definekey('product');
ha.definedone();
end;
set x;
by client date ;
retain type 'new ';
if first.client then ha.clear();
if first.date then type='new';
if ha.check()=0 then type='repeat';
ha.replace();
if last.date;
drop product ;
run;
A problem is if there are missing value in PRODUCT, what you gotta do ?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.