BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Walternate
Obsidian | Level 7

Hi all,

I have a dataset that is unique at the person-level. Each person has an ID number, between 1 and 16 start dates, and between 1 and 16 end dates. The dates are paired (ie, StartDt1 goes with EndDt1, etc).

Basically the data look like this:

ID     Start1     Start2     Start3...........End1        End2          End3..................etc

1     5/5/2009   6/7/2009  8/5/2009      5/10/2010 6/12/2011    10/12/2013

2     1/1/1995   2/3/2005                     4/5/2009   5/3/2007

3     2/3/2007   8/9/2009                     1/3/2009   12/12/2012

Basically, I want to know who left the program in 2009 (ie, one of their end dates has the year 2009). The problem is, if they have a start date that is after the 2009 end date but still in 2009, they do not count as leaving. So, ID 2 would count as leaving in 2009, but ID3 would not even though he also has a 2009 end date, because Start2 is later in the same year.

I assume that since I have so many dates and the comparisons wouldn't always be to adjacent dates, some sort of array would be appropriate, but I'm not quite sure how to go about it.

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

OK, then it would require arrays.  Here's a good way:

data want;

   set have;

   array start {16};

   array end {16};

   left_in_2009='N';

   do _n_=1 to 16;

      if year(end{_n_}) = 2009 then do;

          last_left = max(last_left, end{_n_});

          left_in_2009='Y';

          found_at = _n_;

      end;

   end;

   if left_in_2009='Y' then do _n_=1 to 16;

      if (last_left <= start{_n_}  < '01jan2010'd)

      and (found_at ne _n_)  then left_in_2009='N';

   end;

run;

There's a little bit of processing that might be extra careful, taking into consideration whether a person started and ended on the same date.  When you're confident that the code is working properly, you might want to drop last_left and found_at.
    

View solution in original post

6 REPLIES 6
Astounding
PROC Star

If an ID left in 2009 and started again in 2010, would that count as leaving in 2009?  The amount of programming would change dramatically depending on your answer.

Walternate
Obsidian | Level 7

If they left in 2009 and started again in 2010, it would count as leaving in 2009.

Astounding
PROC Star

OK, then it would require arrays.  Here's a good way:

data want;

   set have;

   array start {16};

   array end {16};

   left_in_2009='N';

   do _n_=1 to 16;

      if year(end{_n_}) = 2009 then do;

          last_left = max(last_left, end{_n_});

          left_in_2009='Y';

          found_at = _n_;

      end;

   end;

   if left_in_2009='Y' then do _n_=1 to 16;

      if (last_left <= start{_n_}  < '01jan2010'd)

      and (found_at ne _n_)  then left_in_2009='N';

   end;

run;

There's a little bit of processing that might be extra careful, taking into consideration whether a person started and ended on the same date.  When you're confident that the code is working properly, you might want to drop last_left and found_at.
    

jwillis
Quartz | Level 8

Walternate,

I do not have a dataset to play with so I cannot test my logic.  The code and assumptions do need to be tested.

/** assuming that all dates are valid; that there are no duplicate rows;

    that dates are in low to high order from left to right in each row;

    that every date variable has a value; and that there is no bad data **/

data finddates;

  set dates;

  array strt {16} start1-start16;

  array end  {16} end1-end16;

  array yrs  {16} strtyr1-strtyr16 (2099*16);

  array yre  {16} endyr1-endyr16   (2199*16);

  format cntrs cntre 8.;

  cntrs = 0; cntre = 0;

do i=1 to 16;

    endyr = 0;

    yrs(i) = year(strt(i));

    yre(i) = year(end(i));

    if yrs(i) = 2009 then cntrs = cntrs + 1;

    if yre(i) = 2009 then endyr = 1;

     if cntrs > 0 then endyr = 0;

       cntre = cntre + endyr;

end;

run;

Walternate
Obsidian | Level 7

This code works, but am I correct in saying that it does not take into account the relative positions of the 2009 dates--that is, if you have an end date that is August 2009, your start date would have to be in 2009 but also after August in order to count as non-terminating.

pradeepalankar
Obsidian | Level 7

simpleast approach for your query would be

if max year from startdate <2009 and end date lies in 2009 then its a leaving id.

data test;

input ID     (Start1-start3 End1-End3)(:ddmmyy10.);

format Start1-start3 End1-End3 ddmmyy10.;

cards;

1     5/5/2009   6/7/2009  8/5/2009      5/10/2010 6/12/2011    10/12/2013

2     1/1/1995   2/3/2005  .      4/5/2009   5/3/2007 .

3     2/3/2007   8/9/2009  8/5/2009      1/3/2009   12/12/2012 10/12/2013

;

data test;

set test;

if ((max(year(start1),year(start2),year(start3))<2009)

and (year(end1)=2009 or year(end2)=2009 or year(end3)=2009)) then flag=1;

run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1624 views
  • 6 likes
  • 4 in conversation