02-20-2012 10:24 AM
i have a large data set that has details of when a client first made a deposit and the last date of deposit. For instance Clent A has first_date_deposit as 15/07/2003 and last_date_deposit as 24/02/2010.(this is the date format in the dataset) I want to create a new variable that will display the total number of months that has elapsed between the first date of deposit and last date of deposit.
02-20-2012 10:32 AM
Question: When you say total number of months, do you mean the actual calender months in between or only the months that have deposits?
For the former, all you need to apply intck() function on two dates of yours, marking 'month' as the unit of intervals; for the later, you need to count the events happening within the same customer ID, which can be done using numerous ways, such as data step, proc SQL, freq, mean etc.
02-20-2012 10:37 AM
You can use the interval functions to determine the number of month boundaries that have to be crossed to get from one date to another, but wouldn't it be more accurate to just use the number of days?
That can be computed by just subtracting the two dates.
02-21-2012 04:42 AM
If you want start date and end date has the same day when passing a month. not like intnx() only concentrate on first day every month.
data _null_; infile datalines dlm=' '; input start : ddmmyy12. end : ddmmyy12.; number=intck('month',start-day(start)+1,end-day(start)+1); put number= ; datalines; 15/07/2003 14/08/2010 ; run;
02-21-2012 09:37 AM
Thanks for all the suggestions. I am not familiar with proc codes.
But can i use the column label with the INTCK statement instead of date such as
no_of_month=INTCK('month', first_date_dep, last_date_dep); so counting the number of months between the first date of deposit and last date of deposit
02-21-2012 10:31 AM
It is why I was suggesting using days elapsed rather than months. If you have to use months, then I would suggest ensuring that you use the 'c' alignment argument. It is easiest to show with an example:
informat date1 date2 date9.;
format date1 date2 date9.;
input date1 date2;
Running the above code will produce a file containing:
Obs date1 date2 months1 months2 days
1 01FEB2010 28FEB2010 0 0 27
2 03FEB2010 02MAR2010 1 0 27
3 03FEB2010 03MAR2010 1 1 28
As you can see, without the 'c' argument, the intck function only counts the number of month boundaries that have to be crossed to get to one date from another date.
With it, it still only counts the number of boundaries that have to be crossed, but at least using the day of the month as the boundary.
Days, on the other hand, I think gives a more accurate picture.