BookmarkSubscribeRSS Feed
spg
Obsidian | Level 7 spg
Obsidian | Level 7
I have a dataset with purchase history (dates of purchase) of a product for different consumers. The dates of purchase (in the DD/MM/YY format) are sorted by consumers. I want to calculate the average time between two buys for each consumer. Can anybody please help? The product category is not important here, just the time between two purchases.

The data looks like this:

Date Consumer Product
1/1/08 1 24
1/1/08 2 23
6/2/08 2 24
2/3/07 3 22
7/3/08 3 24
4/4/06 4 25
2/4/07 5 23
5/9/07 5 23

Thank you.
9 REPLIES 9
RickM
Fluorite | Level 6
You might want to look into sorting by date and consumer and then in a data step use first. and a retain statement to figure out if you are looking at the first instance of a consumer or not and take the difference between consecutive times for each consumer.
barheat
Fluorite | Level 6
A follow up to the previous suggestion. Sort the data by consumer and purchase date.

proc sort data=purchases;
by consumer purchase_date;
run;

Use the lag function to count the days between purchases

data purchases;
set purchases;
if consumer=lag1(consumer) then days=purchase_date-lag1(purchase_date);
else days=.;
run;

Now run a proc means/summary.

proc summary data=purchases nway;
var days;
class consumer;
output out=purchase_analysis(drop=_type_) mean=days;
run;

In the output table _freq_ will be the number of purchases - 1.
Cynthia_sas
SAS Super FREQ
Hi:
Two solutions that do not use the LAG function are shown below. The first approach assumes that there are ONLY 2 purchases. The second solution uses an ARRAY (after the TRANSPOSE) to create a DIFFn variable for every DATEn variable (except the first one) -- this would work for more than 2 purchases.

cynthia
[pre]
** If only 2 purchases, then simple PROC TRANSPOSE and
DATA step with subtraction;
data purch;
infile datalines;
input date : mmddyy8. consumer product;
return;
datalines;
1/1/08 1 24
1/1/08 2 23
6/2/08 2 24
2/3/07 3 22
7/3/08 3 24
4/4/06 4 25
2/4/07 5 23
5/9/07 5 23
;
run;

proc transpose data=purch out=purchout(drop=_name_) prefix=date;
by consumer;
var date;
run;

data calcdiff;
set purchout;
daysbetween = date2 - date1;
run;

proc print data=calcdiff;
title 'Calculate Difference in Dates when ONLY 2 purchases';
run;

** Approach 2: May have more purchases and want to know duration;
** between date2 and date1, date3 and date2, date4 and date3, etc;
ods listing;
data purch_more;
infile datalines;
input date : mmddyy8. consumer product;
return;
datalines;
1/1/08 1 24
1/1/08 2 23
6/2/08 2 24
1/1/09 2 25
2/3/07 3 22
7/3/08 3 24
4/4/06 4 25
2/4/07 5 23
5/9/07 5 23
6/15/07 5 24
12/22/07 5 25
;
run;

proc transpose data=purch_more out=moreout(drop=_name_) prefix=date;
by consumer;
var date;
run;

data morediff;
set moreout;
** define array bigger than needed;
array dt date1-date10;
array df diff1-diff10;

** find out number of purchases;
numpurch = n(of date1-date10);

** set diff1 = 0 (because date1 is first purchase);
diff1 = 0;

** use numpurch in do loop to calculate diff2-diff10;
** but loop will only go to NUMPURCH value for each obs;
do i = 2 to numpurch;
df(i) = dt(i) - dt(i-1);
end;
run;

proc print data=morediff;
var consumer date1 diff1 date2 diff2 date3 diff3 date4 diff4;
format date1-date4 mmddyy8.;
run;

[/pre]
spg
Obsidian | Level 7 spg
Obsidian | Level 7
Thanks RickM!
Thanks barheat!
Thanks Cynthia!
I learnt three ways to approach the problem. Tried all of them...(but am yet to master the first. and retain method).
RickM
Fluorite | Level 6
using first. and retain would be similar to using lag1. Something like

proc sort data=purchases;
by consumer date;
run;

data purchases;
set purchases;
retain olddate;
if first.customer then days=.;
else days=date-olddate;
olddate=date;
run;
Ksharp
Super User
You have not By variable.
So first.customer doesn't work.
RickM
Fluorite | Level 6
Right, I forgot about that part. Thats what happens when you dont test your code.
none1
Calcite | Level 5

Hello,

I have a date format which looks as follows:

04.10.2011 01:33:39

Now suppost I have two observations. one at the above mentioned data and time and the second after 30 seconds:

i.e.

04.10.2011 01:34:09

What formula should I use the calcuate the differnece between the two measurement dates  ?

I also tried, data differnce, it doesn't seem to work for this date format ?

Kind Regards

Ksharp
Super User

DIF( )

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1357 views
  • 0 likes
  • 6 in conversation