Hi all,
I have a dataset that is unique at the person-level. Each person has an ID number, between 1 and 16 start dates, and between 1 and 16 end dates. The dates are paired (ie, StartDt1 goes with EndDt1, etc).
Basically the data look like this:
ID Start1 Start2 Start3...........End1 End2 End3..................etc
1 5/5/2009 6/7/2009 8/5/2009 5/10/2010 6/12/2011 10/12/2013
2 1/1/1995 2/3/2005 4/5/2009 5/3/2007
3 2/3/2007 8/9/2009 1/3/2009 12/12/2012
Basically, I want to know who left the program in 2009 (ie, one of their end dates has the year 2009). The problem is, if they have a start date that is after the 2009 end date but still in 2009, they do not count as leaving. So, ID 2 would count as leaving in 2009, but ID3 would not even though he also has a 2009 end date, because Start2 is later in the same year.
I assume that since I have so many dates and the comparisons wouldn't always be to adjacent dates, some sort of array would be appropriate, but I'm not quite sure how to go about it.
Thanks!
OK, then it would require arrays. Here's a good way:
data want;
set have;
array start {16};
array end {16};
left_in_2009='N';
do _n_=1 to 16;
if year(end{_n_}) = 2009 then do;
last_left = max(last_left, end{_n_});
left_in_2009='Y';
found_at = _n_;
end;
end;
if left_in_2009='Y' then do _n_=1 to 16;
if (last_left <= start{_n_} < '01jan2010'd)
and (found_at ne _n_) then left_in_2009='N';
end;
run;
There's a little bit of processing that might be extra careful, taking into consideration whether a person started and ended on the same date. When you're confident that the code is working properly, you might want to drop last_left and found_at.
If an ID left in 2009 and started again in 2010, would that count as leaving in 2009? The amount of programming would change dramatically depending on your answer.
If they left in 2009 and started again in 2010, it would count as leaving in 2009.
OK, then it would require arrays. Here's a good way:
data want;
set have;
array start {16};
array end {16};
left_in_2009='N';
do _n_=1 to 16;
if year(end{_n_}) = 2009 then do;
last_left = max(last_left, end{_n_});
left_in_2009='Y';
found_at = _n_;
end;
end;
if left_in_2009='Y' then do _n_=1 to 16;
if (last_left <= start{_n_} < '01jan2010'd)
and (found_at ne _n_) then left_in_2009='N';
end;
run;
There's a little bit of processing that might be extra careful, taking into consideration whether a person started and ended on the same date. When you're confident that the code is working properly, you might want to drop last_left and found_at.
Walternate,
I do not have a dataset to play with so I cannot test my logic. The code and assumptions do need to be tested.
/** assuming that all dates are valid; that there are no duplicate rows;
that dates are in low to high order from left to right in each row;
that every date variable has a value; and that there is no bad data **/
data finddates;
set dates;
array strt {16} start1-start16;
array end {16} end1-end16;
array yrs {16} strtyr1-strtyr16 (2099*16);
array yre {16} endyr1-endyr16 (2199*16);
format cntrs cntre 8.;
cntrs = 0; cntre = 0;
do i=1 to 16;
endyr = 0;
yrs(i) = year(strt(i));
yre(i) = year(end(i));
if yrs(i) = 2009 then cntrs = cntrs + 1;
if yre(i) = 2009 then endyr = 1;
if cntrs > 0 then endyr = 0;
cntre = cntre + endyr;
end;
run;
This code works, but am I correct in saying that it does not take into account the relative positions of the 2009 dates--that is, if you have an end date that is August 2009, your start date would have to be in 2009 but also after August in order to count as non-terminating.
simpleast approach for your query would be
if max year from startdate <2009 and end date lies in 2009 then its a leaving id.
data test;
input ID (Start1-start3 End1-End3)(:ddmmyy10.);
format Start1-start3 End1-End3 ddmmyy10.;
cards;
1 5/5/2009 6/7/2009 8/5/2009 5/10/2010 6/12/2011 10/12/2013
2 1/1/1995 2/3/2005 . 4/5/2009 5/3/2007 .
3 2/3/2007 8/9/2009 8/5/2009 1/3/2009 12/12/2012 10/12/2013
;
data test;
set test;
if ((max(year(start1),year(start2),year(start3))<2009)
and (year(end1)=2009 or year(end2)=2009 or year(end3)=2009)) then flag=1;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.