**Note: Modified and edited today 8/17/2018:**
Hi,
I am using SAS 9.4. I have a dataset where each ID can be repeated. So there are multiple records per ID. Each record has a start and an end date. I am trying to create a categorical variable based off those
You know if someone's enrolled if their status is "active."
Here's a sample data set:
data have;
input ID status (start end) (:mmddyy10.);
format start end yymmdd10.;
cards;
1A active 01012011 02012011
1A revoked 02012011 05012013
1A active 05012013 06032016
1B active 05032016 07012016
1B revoked 07012016 .
2A active 03012016 05012018
2A deactivated 05012018 06012018
2A active 06012018 .
3A active 12092003 09062016
3A deactivated 09062016 09272016
3A active 09272016 .
4A active 01032015 02012015
4A deactivated 02012015 05042015
4A active 05042015 03032017
4A deactivated 03032017 .
4B active 03032015 .
;
data have2;
set have;
format start mmddyy10.;
format end mmddyy10.;
run;
So if someone is revoked or deactivated, that means they are not active. Now:
I should mention that a missing in the end date means that they're ongoing. So for 4b, they were active on 3//2015, and have been active since then, since there is a "." for the last end date.
data want;
retain start;
format start end laststartdate mmddyy10.;
set have;
by ID;
if first.ID then startA = start;
if last.ID then do;
endA = end;
laststartdate = start;
output ;
end;
run;
data want2;
format cat2016 cat16f.;
set want;
if startA <'01JAN2016'd and endA >= '01JAN2016'd and endA <= '31DEC2016'd then cat2016 = 1;
else if (startA >= '01JAN2016'd and startA =< '31DEC2016'd then cat2016 = 2;
else if (startA < '01JAN2016'd and laststartdate > '31DEC2016'd) or (laststartdate >= startA and startA < '01JAN2016'd and end = .) then cat2016 = 4;
/*CREATE CONTINUOUS VARIABLE -N UMBER OF DAYS ACTIVE IN 2016*/
/*assume active if end = .*/
if endA = . and startA < '01JAN2016'd then daysin2016 = 365;
/*for those who ended sometime in 2016; start can be any time*/
if endA >='01JAN2016'd and endA <'31DEC2016'd then daysin2016 = intck('day', '01JAN2016'd, endA);
/*For those who start and end in 2016*/
if startA >='01JAN2016'd and startA < '31DEC2016'd and endA >='01JAN2016'd and endA <'31DEC2016'd then daysin2016 = intck('day', startA, endA);
if startA >= '01JAN2016'd and startA < '31DEC2016'd then daysin2016 = intck('day', startA, '31DEC2016'd );
run;
data finalwant;
input ID status (start end) (:mmddyy10.) CAT2016 daysin2016;
format start end yymmdd10.;
cards;
1A active 01012011 02012011 1 154
1A revoked 02012011 05012013 1 154
1A active 05012013 06032016 1 154
1B active 05032016 07012016 1 59
1B revoked 07012016 . 1 59
2A active 03012016 05012018 2 305
2A deactivated 05012018 06012018 2 305
2A active 06012018 . 2 305
3A active 12092003 09062016 3 344
3A deactivated 09062016 09272016 3 344
3A active 09272016 . 3 344
4A active 01032015 02012015 4 365
4A deactivated 02012015 05042015 4 365
4A active 05042015 03032017 4 365
4A deactivated 03032017 . 4 365
4B active 03032015 . 4 365
;
I modified the example data to create an enrolment gap on 02MAY2016 for ID=3. The following program will also cover the case where some enrolment periods would overlap. It is assumed that the data is sorted by ID.
data have;
input ID (start end) (:mmddyy10.);
format start end yymmdd10.;
cards;
1 01012011 02012011
1 02012011 05012013
1 05012013 06032016
2 03012016 05012018
2 05012018 06012018
2 06012018 .
3 01012015 02012016
3 02012016 05012016 <- modified to create a 1 day gap
3 05032016 07012017
4 01032015 02012015
4 02012015 02042017
5 03032015 .
;
/* Convert period start and end dates to numbers for array indexing */
%let dstart=%sysfunc(mdy(01,01,2016));
%let dend=%sysfunc(mdy(12,31,2016));
data want;
array d {&dstart:&dend} _temporary_;
call missing(of d{*});
do until(last.ID);
set have; by ID;
do dt = max(start, &dstart) to min(end, &dend);
d{dt} = 1;
end;
end;
if cmiss(of d{*}) = 0 then cat = 4;
else if missing(d{&dend}) then cat = 1;
else if missing(d{&dstart}) then cat = 2;
else cat = 3;
do until(last.ID);
set have; by ID;
output;
end;
drop dt;
run;
proc print data=want noobs; run;
Hello! Thank you for your help!!! This is so informative and interesting and I'm learning so much. I have an update though, and I'll update my original question as well.
What if the data look like this, where we determine active and not active based on the status variable?
data have;
input ID status (start end) (:mmddyy10.);
format start end yymmdd10.;
cards;
1A active 01012011 02012011
1A revoked 02012011 05012013
1A active 05012013 06032016
1B active 05032016 07012016
1B revoked 07012016 .
2A active 03012016 05012018
2A deactivated 05012018 06012018
2A active 06012018 .
3A active 12092003 09062016
3A deactivated 09062016 09272016
3A active 09272016 .
4A active 01032015 02012015
4A deactivated 02012015 05042015
4A active 05042015 03032017
4A deactivated 03032017 .
4B active 03032015 .
;
So if someone is revoked or deactivated, that means they are not active. Now:
I should mention that a missing in the end date means that they're ongoing. So for 4b, they were active on 3//2015, and have been active since then, since there is a "." for the last end date.
Is it difficult to modify your code to this? Thank you very much, apologies. I misunderstood the task 😞
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.