Hi,
I am working in SAS 9.3, doing a study where I look at how subjects move between two different environments (e.g. hospital and home) over time. All dates that correspond to an admission to the hospital are indicated by time=0, and dates that correspond to released from the hospital are indicated by time=1 (so the value of the time variable alternates between 0 and 1 for each row).
There is also a variable "period" that represents the order of these events for each subject.
If someone moves from the hospital to the home and then back to the hospital, they will have time=0, 1, and then 0 again, and period= 1, 2, then 3.
Below is an example of what the data currently looks like:
Obs ParticipantID Time Period MvmtDate
1 00001 0 1 01/31/2007
2 00001 1 2 05/01/2008
3 00002 0 1 06/01/2012
4 00002 1 2 05/24/2013
5 00003 1 1 02/16/2007
6 00003 0 2 05/06/2010
7 00003 1 3 08/19/2010
8 00004 0 1 06/15/2007
9 00004 1 2 06/21/2008
10 00005 0 1 04/27/2007
11 00005 1 2 11/07/2008
12 00006 0 1 03/15/2007
13 00006 1 2 06/20/2007
14 00007 0 1 08/24/2015
15 00008 0 1 04/17/2007
16 00008 1 2 04/19/2007
17 00009 0 1 05/10/2007
18 00009 1 2 08/03/2007
19 00010 0 1 07/31/2007
20 00010 1 2 08/28/2007
21 00010 0 3 02/20/2008
22 00010 1 4 03/05/2008
23 00011 0 1 02/05/2008
24 00011 1 2 03/12/2010
My research study period of interest is 01/01/2007 - 12/31/2014. I would like to use the movement dates to create time periods with a start date and end date. Taking the first subject, for example, with the following data:
Time Period MvmtDate
0 1 01/31/2007
1 2 05/01/2008
I would like to transform this subject's data into the following (new/manipulated data/variables are bolded):
Time Period PeriodStartDate PeriodEndDate (MvmtDate - 1 day)
1 0 01/01/2007 (study start date) 01/30/2007
0 1 01/31/2007 04/30/2008
1 2 05/01/2008 12/31/2014 (study end date, in this case date of censoring)
Do you have any suggestions? I started off by making a separate database to create the first time period (starting 01/01/2007) which I planned to merge in. Then I thought I could create two separate databases to get start and end dates (e.g. 01/31/2007 and 04/30/2008) and merge together into the appropriate rows by offsetting the period variable. But then I realized that creating the last time period (ending with 12/31/2014) might be difficult, and my code is starting to get very complicated. Perhaps there is a way to do this using PROC SQL? Any suggestions would be greatly appreciated. Thank you!
This is my take on this problem, run it to see if you like
data have;
length PID $5;
input Obs PID release period date :mmddyy10.;
format date yymmdd10.;
drop obs period;
datalines;
1 00001 0 1 01/31/2007
2 00001 1 2 05/01/2008
3 00002 0 1 06/01/2012
4 00002 1 2 05/24/2013
5 00003 1 1 02/16/2007
6 00003 0 2 05/06/2010
7 00003 1 3 08/19/2010
8 00004 0 1 06/15/2007
9 00004 1 2 06/21/2008
10 00005 0 1 04/27/2007
11 00005 1 2 11/07/2008
12 00006 0 1 03/15/2007
13 00006 1 2 06/20/2007
14 00007 0 1 08/24/2015
15 00008 0 1 04/17/2007
16 00008 1 2 04/19/2007
17 00009 0 1 05/10/2007
18 00009 1 2 08/03/2007
19 00010 0 1 07/31/2007
20 00010 1 2 08/28/2007
21 00010 0 3 02/20/2008
22 00010 1 4 03/05/2008
23 00011 0 1 02/05/2008
24 00011 1 2 03/12/2010
;
%let studyStart='01JAN2007'd;
%let studyEnd='31DEC2014'd;
data want;
length environment $8;
format startDate endDate yymmdd10.;
period = 0;
endDate = intnx("DAY", &studyStart, -1);
do until (last.PID);
    set have; by PID date; /* Also check date order */
    where date > &studyStart and date < &studyEnd;
    if release 
        then environment = "HOSPITAL";
        else environment = "HOME";
    startDate = intnx("DAY", endDate, 1);
    endDate = intnx("DAY", date, -1);
    period + 1;
    output;
    if last.PID then do;
        startDate = intnx("DAY", endDate, 1);
        endDate = &studyEnd;
        if release 
            then environment = "HOME";
            else environment = "HOSPITAL";
        period + 1;
        output;
        end;
    end;
keep PID period environment startDate endDate;
run;
proc print data=want;
var period environment startDate endDate;
by PID; id PID;
run;
This isn't fully tested and I'd be surprised if it worked for all cases, so make sure you test it thoroughly. I think it's also very inefficient but hopefully it gets you started. Perhaps someone can post some more succinct code!
I renamed time to status, because it made more sense to me logically and was confusing me 😉
Hope it helps!
data have;
informat obs 8. ParticipantID $8. Time Period 8.;
informat mvmtDate mmddyy10.;
format mvmtDate date9.;
input Obs ParticipantID $ Time Period MvmtDate;
cards;
1 00001 0 1 01/31/2007
2 00001 1 2 05/01/2008
3 00002 0 1 06/01/2012
4 00002 1 2 05/24/2013
5 00003 1 1 02/16/2007
6 00003 0 2 05/06/2010
7 00003 1 3 08/19/2010
8 00004 0 1 06/15/2007
9 00004 1 2 06/21/2008
10 00005 0 1 04/27/2007
11 00005 1 2 11/07/2008
12 00006 0 1 03/15/2007
13 00006 1 2 06/20/2007
14 00007 0 1 08/24/2015
15 00008 0 1 04/17/2007
16 00008 1 2 04/19/2007
17 00009 0 1 05/10/2007
18 00009 1 2 08/03/2007
19 00010 0 1 07/31/2007
20 00010 1 2 08/28/2007
21 00010 0 3 02/20/2008
22 00010 1 4 03/05/2008
23 00011 0 1 02/05/2008
24 00011 1 2 03/12/2010
;
run;
data have;
set have;
by participantID;
if first.participantID then do;
	start=mvmtDate;
	time_start=time;
	time=ifn(time_start=1, 0, 1);
	mvmtDate='01Jan2007'd;
	seq=1;
	output;
	seq+1;
	time=time_start;
	mvmtDate=start;
	output;
end;
else do;
	seq+1;
	output;
end;
drop start time_start;
rename time=status;
run;
proc sql;
create table want as 
select h1.ParticipantID, h1.status, h1.period, 
	h1.mvmtDate as date_start format=date9., case when not missing(h2.mvmtDate) then h2.mvmtDate-1 
							else '31Dec2014'd end as date_end format=date9.
from have as h1
left join have as h2
on h1.participantID=h2.ParticipantID
and h1.seq+1=h2.seq
order by h1.participantID, date_start, date_end;
quit;
data want;
set want;
if date_end>'31Dec2014'd then date_end='31Dec2014'd;
if date_start>'31Dec2014'd then delete;
run;
I love this question . You need to test it on your own. I may not consider about all the scenarios .
data have;
informat obs 8. ParticipantID $8. Time Period 8.;
informat mvmtDate mmddyy10.;
format mvmtDate date9.;
input Obs ParticipantID $ Time Period MvmtDate;
cards;
1 00001 0 1 01/31/2007
2 00001 1 2 05/01/2008
3 00002 0 1 06/01/2012
4 00002 1 2 05/24/2013
5 00003 1 1 02/16/2007
6 00003 0 2 05/06/2010
7 00003 1 3 08/19/2010
8 00004 0 1 06/15/2007
9 00004 1 2 06/21/2008
10 00005 0 1 04/27/2007
11 00005 1 2 11/07/2008
12 00006 0 1 03/15/2007
13 00006 1 2 06/20/2007
14 00007 0 1 08/24/2015
15 00008 0 1 04/17/2007
16 00008 1 2 04/19/2007
17 00009 0 1 05/10/2007
18 00009 1 2 08/03/2007
19 00010 0 1 07/31/2007
20 00010 1 2 08/28/2007
21 00010 0 3 02/20/2008
22 00010 1 4 03/05/2008
23 00011 0 1 02/05/2008
24 00011 1 2 03/12/2010
;
run;
data want;
array t{99999} _temporary_;
array p{99999} _temporary_;
array d{99999} _temporary_;
do i=1 by 1 until(last.ParticipantID);
 set have(where=(MvmtDate between '01jan2007'd and '31dec2014'd));
 by ParticipantID ;
 t{i}=Time ;
 p{i}=Period ;
 d{i}=MvmtDate ;
end;
 
 
do j=1 to i;
 if j=1 then do; 
  if d{1} gt '01jan2007'd then do;
   Time=not t{1};
   Period=p{1}-1;
   PeriodStartDate='01jan2007'd;
   PeriodEndDate=d{1}-1;
   output;
   Time=t{1};
   Period=p{1};
   PeriodStartDate=d{1};
   PeriodEndDate=d{2}-1;
   output;
  end;
  else do;
   Time=t{1};
   Period=p{1};
   PeriodStartDate='01jan2007'd;
   PeriodEndDate=d{2}-1;
   output;
  end;  
 end; 
 
 else if j=i then do;
   Time=t{j};
   Period=p{j};
   PeriodStartDate=d{j};
   PeriodEndDate='31dec2014'd;
   output;
 end;
 
 else do;
   Time=t{j};
   Period=p{j};
   PeriodStartDate=d{j};
   PeriodEndDate=d{j+1}-1;
   output;
 end;
 
end;
format PeriodStartDate PeriodEndDate date9.;
keep ParticipantID  Time Period PeriodStartDate PeriodEndDate;
run;
 This is my take on this problem, run it to see if you like
data have;
length PID $5;
input Obs PID release period date :mmddyy10.;
format date yymmdd10.;
drop obs period;
datalines;
1 00001 0 1 01/31/2007
2 00001 1 2 05/01/2008
3 00002 0 1 06/01/2012
4 00002 1 2 05/24/2013
5 00003 1 1 02/16/2007
6 00003 0 2 05/06/2010
7 00003 1 3 08/19/2010
8 00004 0 1 06/15/2007
9 00004 1 2 06/21/2008
10 00005 0 1 04/27/2007
11 00005 1 2 11/07/2008
12 00006 0 1 03/15/2007
13 00006 1 2 06/20/2007
14 00007 0 1 08/24/2015
15 00008 0 1 04/17/2007
16 00008 1 2 04/19/2007
17 00009 0 1 05/10/2007
18 00009 1 2 08/03/2007
19 00010 0 1 07/31/2007
20 00010 1 2 08/28/2007
21 00010 0 3 02/20/2008
22 00010 1 4 03/05/2008
23 00011 0 1 02/05/2008
24 00011 1 2 03/12/2010
;
%let studyStart='01JAN2007'd;
%let studyEnd='31DEC2014'd;
data want;
length environment $8;
format startDate endDate yymmdd10.;
period = 0;
endDate = intnx("DAY", &studyStart, -1);
do until (last.PID);
    set have; by PID date; /* Also check date order */
    where date > &studyStart and date < &studyEnd;
    if release 
        then environment = "HOSPITAL";
        else environment = "HOME";
    startDate = intnx("DAY", endDate, 1);
    endDate = intnx("DAY", date, -1);
    period + 1;
    output;
    if last.PID then do;
        startDate = intnx("DAY", endDate, 1);
        endDate = &studyEnd;
        if release 
            then environment = "HOME";
            else environment = "HOSPITAL";
        period + 1;
        output;
        end;
    end;
keep PID period environment startDate endDate;
run;
proc print data=want;
var period environment startDate endDate;
by PID; id PID;
run;
I'm still checking the results, but so far it looks like this works perfectly! Thank you so so very much!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
