I have a dataset where I have multiple rows per id, covariates for each id, year of prescription (yearrx), and study start (ss) and study end (se). I want to calculate the total person time donated be each member. What I eventually want to have is 2 rows per id, so that in the first row, I will sum the person time from the study start (ss) until the end of that year, and a second row that will sum the person time one day after the end of the study start year, until the end of the study.
For example patient 2, entered the study In may,2,2001, had 2 prescription types in 2019, and left the study at June,30,2019. I want to create for this patient 2 rows, the first row with the time between SS and end of ss(12,31,year(SS)), and a second row calculating the time between (01,01,year(SS)+1), and the end of the study (SE). I previously tried to calculate the person time in each row but it didn't work, any thoughts? This codes creates columns, but I want two rows per patient at the end.
data x; set try2;
d1= intck('day', ss, mdy(12,31, year(ss)));
d2= intck('day', mdy(1,1,(year(ss)+1)), se);
run;
id | cov1 | cov2 | RxYear | type | NRx | SS | SE |
1 | x | z | 2012 | 2 | 4 | 06-Aug-12 | 30-Jun-19 |
1 | x | z | 2013 | 2 | 8 | 06-Aug-12 | 30-Jun-19 |
1 | x | z | 2014 | 2 | 8 | 06-Aug-12 | 30-Jun-19 |
1 | x | z | 2015 | 2 | 6 | 06-Aug-12 | 30-Jun-19 |
1 | x | z | 2016 | 2 | 5 | 06-Aug-12 | 30-Jun-19 |
2 | y | v | 2019 | 1 | 1 | 02-May-01 | 30-Jun-19 |
2 | y | v | 2019 | 2 | 1 | 02-May-01 | 30-Jun-19 |
I don't understand why you have the other variable and the multiple observations.
So let's ignore them.
data have;
input id cov1 $ cov2 $ RxYear type NRx SS :date. SE :date. ;
format ss se date9.;
cards;
1 x z 2012 2 4 06-Aug-12 30-Jun-19
1 x z 2013 2 8 06-Aug-12 30-Jun-19
1 x z 2014 2 8 06-Aug-12 30-Jun-19
1 x z 2015 2 6 06-Aug-12 30-Jun-19
1 x z 2016 2 5 06-Aug-12 30-Jun-19
2 y v 2019 1 1 02-May-01 30-Jun-19
2 y v 2019 2 1 02-May-01 30-Jun-19
;
proc sort data=have(keep=id ss se) out=subjects nodupkey;
by id ;
run;
data want;
set subjects;
if year(se)=year(ss) then do;
year1=1;
rest=1;
duration=se-ss;
output;
end;
else do;
year1=1;
rest=0;
duration = mdy(1,1,year(ss)+1)-ss;
output;
year1=0;
rest=1;
duration = se - mdy(1,1,year(ss)+1);
output;
end;
run;
proc print;
run;
Results
Obs id SS SE year1 rest duration 1 1 06AUG2012 30JUN2019 1 0 148 2 1 06AUG2012 30JUN2019 0 1 2371 3 2 02MAY2001 30JUN2019 1 0 244 4 2 02MAY2001 30JUN2019 0 1 6389
data x;
set try2;
d1= intck('day', ss, mdy(12,31, year(ss)));
output;
d1= intck('day', mdy(1,1,(year(ss)+1)), se);
output;
run;
SHow us the desired output.
Which is it?
If you do not want to change the number of observations use something like your original code where you have one variable for the first year and another for the "rest". Otherwise if you want to output separate observations for the first year and the rest then by definition you are going to have multiple observations.
I don't understand why you have the other variable and the multiple observations.
So let's ignore them.
data have;
input id cov1 $ cov2 $ RxYear type NRx SS :date. SE :date. ;
format ss se date9.;
cards;
1 x z 2012 2 4 06-Aug-12 30-Jun-19
1 x z 2013 2 8 06-Aug-12 30-Jun-19
1 x z 2014 2 8 06-Aug-12 30-Jun-19
1 x z 2015 2 6 06-Aug-12 30-Jun-19
1 x z 2016 2 5 06-Aug-12 30-Jun-19
2 y v 2019 1 1 02-May-01 30-Jun-19
2 y v 2019 2 1 02-May-01 30-Jun-19
;
proc sort data=have(keep=id ss se) out=subjects nodupkey;
by id ;
run;
data want;
set subjects;
if year(se)=year(ss) then do;
year1=1;
rest=1;
duration=se-ss;
output;
end;
else do;
year1=1;
rest=0;
duration = mdy(1,1,year(ss)+1)-ss;
output;
year1=0;
rest=1;
duration = se - mdy(1,1,year(ss)+1);
output;
end;
run;
proc print;
run;
Results
Obs id SS SE year1 rest duration 1 1 06AUG2012 30JUN2019 1 0 148 2 1 06AUG2012 30JUN2019 0 1 2371 3 2 02MAY2001 30JUN2019 1 0 244 4 2 02MAY2001 30JUN2019 0 1 6389
Perhaps you meant to generate a separate observation per year?
data want;
set subjects;
do year=year(ss) to year(se);
duration = min(se,mdy(12,31,year)) - max(ss,mdy(1,1,year)) + 1;
output;
end;
run;
Results
Obs id SS SE year duration 1 1 06AUG2012 30JUN2019 2012 148 2 1 06AUG2012 30JUN2019 2013 365 3 1 06AUG2012 30JUN2019 2014 365 4 1 06AUG2012 30JUN2019 2015 365 5 1 06AUG2012 30JUN2019 2016 366 6 1 06AUG2012 30JUN2019 2017 365 7 1 06AUG2012 30JUN2019 2018 365 8 1 06AUG2012 30JUN2019 2019 181 9 2 02MAY2001 30JUN2019 2001 244 10 2 02MAY2001 30JUN2019 2002 365 11 2 02MAY2001 30JUN2019 2003 365 12 2 02MAY2001 30JUN2019 2004 366 13 2 02MAY2001 30JUN2019 2005 365 14 2 02MAY2001 30JUN2019 2006 365 15 2 02MAY2001 30JUN2019 2007 365 16 2 02MAY2001 30JUN2019 2008 366 17 2 02MAY2001 30JUN2019 2009 365 18 2 02MAY2001 30JUN2019 2010 365 19 2 02MAY2001 30JUN2019 2011 365 20 2 02MAY2001 30JUN2019 2012 366 21 2 02MAY2001 30JUN2019 2013 365 22 2 02MAY2001 30JUN2019 2014 365 23 2 02MAY2001 30JUN2019 2015 365 24 2 02MAY2001 30JUN2019 2016 366 25 2 02MAY2001 30JUN2019 2017 365 26 2 02MAY2001 30JUN2019 2018 365 27 2 02MAY2001 30JUN2019 2019 181
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.