Hi everyone,
I have a question regarding changing short to long data frame and splitting dates.
Here is the initial data frame:
Patient | Initials | age | Entry | Exit | Event1 | Event2 | Weight |
K89R | AB | 34 | 27-Jul-10 | 17-Aug-13 | 1-Mar-08 | 2-May-11 | 44 |
S33T | ES | 55 | 23-Jul-09 | 12-Mar-12 | 17-Sep-10 | . | 76 |
What you see here, is that there is an entry and exit dates, with dates for the events 1 and 2, there is also a missing date for event 2 for the second patient because the event didn't happen. Also note that the event1 for the first patient happened before entry date.
What I want to get is the following file:
Patient | Initials | age | Entry | Exit | Event1 | Event2 | Weight |
K89R | AB | 34 | 27/07/2010 | 31/12/2010 | 1 | 0 | 44 |
K89R | AB | 34 | 1/01/2011 | 31/12/2011 | 1 | 1 | 44 |
K89R | AB | 34 | 1/01/2012 | 31/12/2012 | 1 | 1 | 44 |
K89R | AB | 34 | 1/01/2013 | 17/08/2013 | 1 | 1 | 44 |
S33T | ES | 55 | 23/07/2009 | 31/12/2009 | 0 | 0 | 76 |
S33T | ES | 55 | 1/01/2010 | 31/12/2010 | 1 | 0 | 76 |
S33T | ES | 55 | 1/01/2011 | 31/12/2011 | 1 | 0 | 76 |
S33T | ES | 55 | 1/01/2012 | 12/03/2012 | 1 | 0 | 76 |
What you notice here is that the entry to exit date period is split into individual rows per patient, each representing a year. The event columns are now coded as 0 (meaning the event has not yet happened) or 1 (the event happened) which is then carried over to the years after because the event has already happened.
The age increases in every row per patient as time progresses
The patient ID and initial remain the same as well as the weight.
Could anyone please help with this, thank you
Solved with a do until and using the intnx() function:
data have;
input patient :$4. entry :yymmdd10. exit :yymmdd10. event1 :yymmdd10. event2 :yymmdd10.;
format entry exit event1 event2 yymmddd10.;
datalines;
K98R 2010-07-27 2013-08-17 2008-03-01 2011-05-02
S33T 2009-07-23 2012-03-12 2010-09-17 .
;
data want;
set have (rename=(exit=_exit event1=_event1 event2=_event2));
format exit yymmddd10.;
exit = min(intnx('year',entry,0,'e'),_exit);
do until (exit = _exit);
if _event1 ne . and _event1 <= exit then event1 = 1; else event1 = 0;
if _event2 ne . and _event2 <= exit then event2 = 1; else event2 = 0;
output;
entry = intnx('year',entry,1,'b');
exit = min(intnx('year',exit,1,'e'),_exit);
end;
output;
keep patient entry exit event1 event2;
run;
proc print data=want noobs;
run;
Result:
patient entry exit event1 event2 K98R 2010-07-27 2010-12-31 1 0 K98R 2011-01-01 2011-12-31 1 1 K98R 2012-01-01 2012-12-31 1 1 K98R 2013-01-01 2013-08-17 1 1 S33T 2009-07-23 2009-12-31 0 0 S33T 2010-01-01 2010-12-31 1 0 S33T 2011-01-01 2011-12-31 1 0 S33T 2012-01-01 2012-03-12 1 0
Solved with a do until and using the intnx() function:
data have;
input patient :$4. entry :yymmdd10. exit :yymmdd10. event1 :yymmdd10. event2 :yymmdd10.;
format entry exit event1 event2 yymmddd10.;
datalines;
K98R 2010-07-27 2013-08-17 2008-03-01 2011-05-02
S33T 2009-07-23 2012-03-12 2010-09-17 .
;
data want;
set have (rename=(exit=_exit event1=_event1 event2=_event2));
format exit yymmddd10.;
exit = min(intnx('year',entry,0,'e'),_exit);
do until (exit = _exit);
if _event1 ne . and _event1 <= exit then event1 = 1; else event1 = 0;
if _event2 ne . and _event2 <= exit then event2 = 1; else event2 = 0;
output;
entry = intnx('year',entry,1,'b');
exit = min(intnx('year',exit,1,'e'),_exit);
end;
output;
keep patient entry exit event1 event2;
run;
proc print data=want noobs;
run;
Result:
patient entry exit event1 event2 K98R 2010-07-27 2010-12-31 1 0 K98R 2011-01-01 2011-12-31 1 1 K98R 2012-01-01 2012-12-31 1 1 K98R 2013-01-01 2013-08-17 1 1 S33T 2009-07-23 2009-12-31 0 0 S33T 2010-01-01 2010-12-31 1 0 S33T 2011-01-01 2011-12-31 1 0 S33T 2012-01-01 2012-03-12 1 0
data have;
input patient :$4. entry :yymmdd10. exit :yymmdd10. event1 :yymmdd10. event2 :yymmdd10.;
format entry exit event1 event2 yymmdd10.;
datalines;
K98R 2010-07-27 2013-08-17 2008-03-01 2011-05-02
S33T 2009-07-23 2012-03-12 2010-09-17 .
;
data temp;
set have;
id+1;
do date=entry to exit;
year=year(date);
flag1=(date>=event1 and not missing(event1));
flag2=(date>=event2 and not missing(event2));
output;
end;
run;
proc sql;
create table want as
select id,patient,min(date) as entry format=yymmdd10.,
max(date) as exit format=yymmdd10.,
max(flag1) as event1,
max(flag2) as event2
from temp
group by id,patient,year;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.