- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi there,
I have a dataset that is structured the following way:
Customer_ID | Date_offended |
A | 19JUL2011:00:00:00.000 |
A | 29AUG2013:00:00:00.000 |
A | 29MAR2015:00:00:00.000 |
B | 05MAR2010:00:00:00.000 |
B | 14AUG2010:00:00:00.000 |
C | 06JUL2013:00:00:00.000 |
C | 02JUN2015:00:00:00.000 |
D | 21AUG2009:00:00:00.000 |
D | 02MAY2013:00:00:00.000 |
I want to be able to calculate the difference (in number of days or months) between the dates of offence for these customers. How do I go about it?
I appreciate your help in advance!
Krisztina
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Assuming that your date_offended variable is a SAS datetime valued variable and your data is grouped by customer and sorted within customer by the date_offended:
data want;
set have;
by customer_id notsorted;
Intervaldays = intck('dtday',lag(Date_offended),Date_offended);
Intervalmonths= intck('dtmonth',lag(Date_offended),Date_offended);
if first.customer_id then do;
Intervaldays=.;
Intervalmonths=.;
end;
run;
Lag returns the value of the previous record. You will get a note in the log because the first record doesn't have a "previous" and the value is missing.
The Intck functions returns boundary interval counts so things like March 31 to April 1 still return 1 month. If you need more complex rules then details will need to be provided. The 'dtday' and 'dtmonth' say to look at the values of the varaibles in the function as datetime values and return the number of day or month boundaries.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Assuming that your date_offended variable is a SAS datetime valued variable and your data is grouped by customer and sorted within customer by the date_offended:
data want;
set have;
by customer_id notsorted;
Intervaldays = intck('dtday',lag(Date_offended),Date_offended);
Intervalmonths= intck('dtmonth',lag(Date_offended),Date_offended);
if first.customer_id then do;
Intervaldays=.;
Intervalmonths=.;
end;
run;
Lag returns the value of the previous record. You will get a note in the log because the first record doesn't have a "previous" and the value is missing.
The Intck functions returns boundary interval counts so things like March 31 to April 1 still return 1 month. If you need more complex rules then details will need to be provided. The 'dtday' and 'dtmonth' say to look at the values of the varaibles in the function as datetime values and return the number of day or month boundaries.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks a lot! It worked. I was doing it in such an inefficient way. Creating separate columns for each differring date. This advice saved a lot of time and I will for sure use it another time. Thanks again!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
For number of days:
data want;
set have;
by Customer_ID;
n_days = datepart(date_offended) - datepart(lag(date_offended));
if first.Customer_ID then n_days = .;
run;
Given that you have the number of days, you can convert that to months using whatever formula you would like. But if you have a different calculation in mind, SAS can probably do it. You just have to specify what your rules are.
There's a simpler formula for number of days that may work, but I'm unable to test it now:
n_days = dif(datepart(date_offended));
But however you do it, your DATE_OFFENDED variable is a datetime value, not a date value. So the DATEPART function will be needed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You didn't post the output yet ?
data have;
infile cards expandtabs;
input Customer_ID $ Date_offended anydtdtm32.;
format Date_offended datetime.;
cards;
A 19JUL2011:00:00:00.000
A 29AUG2013:00:00:00.000
A 29MAR2015:00:00:00.000
B 05MAR2010:00:00:00.000
B 14AUG2010:00:00:00.000
C 06JUL2013:00:00:00.000
C 02JUN2015:00:00:00.000
D 21AUG2009:00:00:00.000
D 02MAY2013:00:00:00.000
;
run;
data want;
set have;
by Customer_ID ;
lag=lag(Date_offended);
dif_date=intck('dtday',lag,Date_offended,'c');
dif_month=intck('dtmonth',lag,Date_offended,'c');
dif_year=intck('dtyear',lag,Date_offended,'c');
if first.Customer_ID then call missing(of dif_:);
drop lag;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you! I think the first reply i got was the simplest and easiest way for what i will need my data for. But I appreciate your help. Thanks again.