Hi,
I have a question. I need to followup an object until 3 year period with a condition that the object has not been sold i last 5 year.
Example:
Object date value
1 01-01-2003 --
1 01-01-2004 --
1 01-02-2009 1
1 01-01-2010 2
1 01-01-2016 --
1 01-01-2017 --
what i need is first to check if there is a difference between the two dates within the same object is more than 5 years, if it is true then followup the same object upto 3 year. ie the difference between the third and second is more than five years, so it will be our index date and it shall be followed upto 3 years.
@ifti_ch2002 wrote:...
1) f=1 if the difference between the current date and lag_date (previous) is greater than 5 years
2) f=1 will be the index date
3)f+1 if the difference between the following date and index date is less than 3 years
To check the first thing you need to get the change in DATE between observations. LAG() will help.
For the second one I am going jump to the conclusion that you meant to say that when there is a jump of over 5 years you want to set F to 1 AND also remember what DATE was at that time.
For the third one I am going to jump to another conclusion that you want to keep incrementing F as long as the difference from the saved index date is <= 3 years. AND that when it is not then you set F to missing.
So let's setup your sample data with your expected values as WANT (instead of F).
data have;
  infile datalines truncover;
  input IDnumber date  :mmddyy10. want;
  format date yymmdd10.;
datalines;
1 4-7-2003 .
1 4-7-2009 1
1 4-7-2010 2
1 4-7-2011 3
1 4-7-2012 4
1 4-7-2013 .
1 4-7-2014 .
1 4-7-2015 .
1 4-7-2016 .
1 4-7-2018 .
2 4-7-2005 .
2 4-7-2011 1
2 4-7-2012 2
2 4-7-2013 3
2 4-7-2014 4
2 4-7-2015 .
2 4-7-2016 .
2 4-7-2017 .
2 4-7-2018 .
2 4-7-2019 .
2 4-7-2020 .
;So it is pretty straight forward to code once the rules are clear. Remember to call the LAG() function on every observation so it sees every value. But don't use the resuliting difference when you are starting a new group.
data want;
  set have;
  by idnumber date ;
  yrdif = intck('year',lag(date),date,'cont');
  if first.idnumber then call missing(yrdif,index_date,f);
  if f then do;
    if intck('year',index_date,date,'cont') <= 3 then f+1;
    else f=.;
  end;
  else if yrdif >= 5 then do;
    index_date=date;
    f=1;
  end;
  format index_date yymmdd10.;
  retain index_date f;
run;Results:
Obs IDnumber date want yrdif index_date f 1 1 2003-04-07 . . . . 2 1 2009-04-07 1 6 2009-04-07 1 3 1 2010-04-07 2 1 2009-04-07 2 4 1 2011-04-07 3 1 2009-04-07 3 5 1 2012-04-07 4 1 2009-04-07 4 6 1 2013-04-07 . 1 2009-04-07 . 7 1 2014-04-07 . 1 2009-04-07 . 8 1 2015-04-07 . 1 2009-04-07 . 9 1 2016-04-07 . 1 2009-04-07 . 10 1 2018-04-07 . 2 2009-04-07 . 11 2 2005-04-07 . . . . 12 2 2011-04-07 1 6 2011-04-07 1 13 2 2012-04-07 2 1 2011-04-07 2 14 2 2013-04-07 3 1 2011-04-07 3 15 2 2014-04-07 4 1 2011-04-07 4 16 2 2015-04-07 . 1 2011-04-07 . 17 2 2016-04-07 . 1 2011-04-07 . 18 2 2017-04-07 . 1 2011-04-07 . 19 2 2018-04-07 . 1 2011-04-07 . 20 2 2019-04-07 . 1 2011-04-07 . 21 2 2020-04-07 . 1 2011-04-07 .
for Base SAS, a Do Whitlock loop can be written to iterate through this, giving respect to a by statement.
So many resources -- first link from Google:
https://support.sas.com/resources/papers/proceedings12/052-2012.pdf
data test; infile datalines missover; input IDnumber @3 date mmddyy10. @12 f; datalines; 1 4-7-2003 1 4-7-2009 1 1 4-7-2010 2 1 4-7-2011 3 1 4-7-2012 4 1 4-7-2013 1 4-7-2014 1 4-7-2015 1 4-7-2016 1 4-7-2018 2 4-7-2005 2 4-7-2011 1 2 4-7-2012 2 2 4-7-2013 3 2 4-7-2014 4 2 4-7-2015 2 4-7-2016 2 4-7-2017 2 4-7-2018 2 4-7-2019 2 4-7-2020 ; run; data test1; set test; format date lag_date mmddyy10.; run;
1) f=1 if the difference between the current date and lag_date (previous) is greater than 5 years
2) f=1 will be the index date
3)f+1 if the difference between the following date and index date is less than 3 years
@PhilC: I think that the variable "f" is the requested result and not present in the data.
@ifti_ch2002 : Please specify clearly what you have and want you want, it is quite uncommon to present one dataset containing both at the same time. And you wrote "3)f+1 if the difference between the following date and index date is less than 3 years" - with "following date" the date of the observation is meant in which f is set to f+1? And "less than" seems to be wrong, because the difference between index-Date and current date in the fifth observation is three and not less than three.
Some ideas:
@andreas_lds sorry for that. I should be more specifically define what i have and what i need.
f column is what i required.
Following date i mean where the value of f+1 will be.
And i mean less than or euql to 3 years
@ifti_ch2002 wrote:...
1) f=1 if the difference between the current date and lag_date (previous) is greater than 5 years
2) f=1 will be the index date
3)f+1 if the difference between the following date and index date is less than 3 years
To check the first thing you need to get the change in DATE between observations. LAG() will help.
For the second one I am going jump to the conclusion that you meant to say that when there is a jump of over 5 years you want to set F to 1 AND also remember what DATE was at that time.
For the third one I am going to jump to another conclusion that you want to keep incrementing F as long as the difference from the saved index date is <= 3 years. AND that when it is not then you set F to missing.
So let's setup your sample data with your expected values as WANT (instead of F).
data have;
  infile datalines truncover;
  input IDnumber date  :mmddyy10. want;
  format date yymmdd10.;
datalines;
1 4-7-2003 .
1 4-7-2009 1
1 4-7-2010 2
1 4-7-2011 3
1 4-7-2012 4
1 4-7-2013 .
1 4-7-2014 .
1 4-7-2015 .
1 4-7-2016 .
1 4-7-2018 .
2 4-7-2005 .
2 4-7-2011 1
2 4-7-2012 2
2 4-7-2013 3
2 4-7-2014 4
2 4-7-2015 .
2 4-7-2016 .
2 4-7-2017 .
2 4-7-2018 .
2 4-7-2019 .
2 4-7-2020 .
;So it is pretty straight forward to code once the rules are clear. Remember to call the LAG() function on every observation so it sees every value. But don't use the resuliting difference when you are starting a new group.
data want;
  set have;
  by idnumber date ;
  yrdif = intck('year',lag(date),date,'cont');
  if first.idnumber then call missing(yrdif,index_date,f);
  if f then do;
    if intck('year',index_date,date,'cont') <= 3 then f+1;
    else f=.;
  end;
  else if yrdif >= 5 then do;
    index_date=date;
    f=1;
  end;
  format index_date yymmdd10.;
  retain index_date f;
run;Results:
Obs IDnumber date want yrdif index_date f 1 1 2003-04-07 . . . . 2 1 2009-04-07 1 6 2009-04-07 1 3 1 2010-04-07 2 1 2009-04-07 2 4 1 2011-04-07 3 1 2009-04-07 3 5 1 2012-04-07 4 1 2009-04-07 4 6 1 2013-04-07 . 1 2009-04-07 . 7 1 2014-04-07 . 1 2009-04-07 . 8 1 2015-04-07 . 1 2009-04-07 . 9 1 2016-04-07 . 1 2009-04-07 . 10 1 2018-04-07 . 2 2009-04-07 . 11 2 2005-04-07 . . . . 12 2 2011-04-07 1 6 2011-04-07 1 13 2 2012-04-07 2 1 2011-04-07 2 14 2 2013-04-07 3 1 2011-04-07 3 15 2 2014-04-07 4 1 2011-04-07 4 16 2 2015-04-07 . 1 2011-04-07 . 17 2 2016-04-07 . 1 2011-04-07 . 18 2 2017-04-07 . 1 2011-04-07 . 19 2 2018-04-07 . 1 2011-04-07 . 20 2 2019-04-07 . 1 2011-04-07 . 21 2 2020-04-07 . 1 2011-04-07 .
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
