BookmarkSubscribeRSS Feed
ginak
Quartz | Level 8

**Note: Modified and edited today 8/17/2018:**

 

Hi,

 

I am using SAS 9.4. I have a dataset where each ID can be repeated. So there are multiple records per ID. Each record has a start and an end date. I am trying to create a categorical variable based off those

  1. whose enrollment ended sometime in 2016
  2. whose enrollment began sometime in 2016
  3. that were enrolled at the start and end of 2016 but had 2016 enrollment gaps
  4. that were enrolled in the entire calendar year 2016

You know if someone's enrolled if their status is "active."

 

Here's a sample data set:

data have; 
input ID status (start end) (:mmddyy10.);
format start end yymmdd10.;
cards;
1A active 01012011 02012011
1A revoked 02012011 05012013
1A active 05012013 06032016
1B active 05032016 07012016
1B revoked 07012016 .
2A active 03012016 05012018
2A deactivated 05012018 06012018
2A active 06012018 .
3A active 12092003 09062016
3A deactivated 09062016 09272016
3A active 09272016 .
4A active 01032015 02012015
4A deactivated 02012015 05042015
4A active 05042015 03032017
4A deactivated 03032017 .
4B active 03032015 .
;
data have2;
	set have;
	format start mmddyy10.;
	format end mmddyy10.;
run;
As you can see, the ID corresponds with the categorical variable in my sample data set.
 

So if someone is revoked or deactivated, that means they are not active. Now:

  • ID 1A and 1B are category 1, "Suppliers whose enrollment ended sometime in 2016" (1A's last active end date is 06/03/2016, and 1B's last active end date is 07/01/2016.
  • 2A falls under "Suppliers whose enrollment began sometime in 2016" 03/01/2016,
  • 3A falls under "Suppliers that were enrolled at the start and end of 2016 but had 2016 enrollment gaps" as you can see on that they became deactivated from 9/6/2016-9/27/2016, and then active again on 9/27/2016.
  • 4A and 4B fall under "enrolled the entire calendar year of 2016"

I should mention that a missing in the end date means that they're ongoing. So for 4b, they were active on 3//2015, and have been active since then, since there is a "." for the last end date.

 

 
I  have this sql code:
proc sql;
create table want as
select id,  min(start) as start, max(end) as end
from have
group by id;
quit;
 
but the problem is, this doesn't help me find category 3. Even so, I'm not sure how to completely categorize them into 4 from this.
 
Then I tried this after sorting by id start end:
 
data want;
set have;
retain startA endA  gapflag started2016 ended2016 startedbefore16;
format startA mmddyy10.;
format endA mmddyy10.;
by id start end;
if first.idthen do;
startA= start;
endA= end;
if start  >= 20454  and start <= 20819 then started2016 =1;
if end >= 20454  and end <= 20819 then ended2016 = 1; 
if start < 20454 then startedbefore16 = 1;
end;
 
else do;
if endA >= 20454 and end <= 20819 and start- endA > 1 then gapflag = 1; 
end;
run;
 
But this didn't quite work either.
 
So I tried this: 

data want;
	retain start;
	format start end laststartdate mmddyy10.;
	set have;
	by ID;
	if first.ID then startA = start;
	if last.ID then do;
		endA = end;
		laststartdate = start;
		output ;
	end;
run;

data want2;
format cat2016 cat16f.;
set want;
if startA <'01JAN2016'd and endA >= '01JAN2016'd and endA <= '31DEC2016'd then cat2016 = 1;

else if (startA >= '01JAN2016'd and startA =< '31DEC2016'd then cat2016 = 2;

else if (startA < '01JAN2016'd and laststartdate > '31DEC2016'd) or (laststartdate >= startA and startA < '01JAN2016'd and end = .) then cat2016 = 4;

/*CREATE CONTINUOUS VARIABLE -N UMBER OF DAYS ACTIVE IN 2016*/
/*assume active if end = .*/
if endA = . and startA < '01JAN2016'd then daysin2016 = 365;
/*for those who ended sometime in 2016; start can be any time*/
if endA >='01JAN2016'd and endA <'31DEC2016'd then daysin2016 = intck('day', '01JAN2016'd, endA);
/*For those who start and end in 2016*/
if startA >='01JAN2016'd and startA < '31DEC2016'd and endA >='01JAN2016'd and endA <'31DEC2016'd then daysin2016 = intck('day', startA, endA);
if startA >= '01JAN2016'd and startA < '31DEC2016'd then daysin2016 = intck('day', startA, '31DEC2016'd );
run;

 
However, this was wrong as it didn't take into account what it means to actually not be active in 2016 (I didn't take the status variable "active, revoked, deactivated" into account). My goal is to get a dataset that looks like this:
 


data finalwant;
input ID status (start end) (:mmddyy10.) CAT2016 daysin2016;
format start end yymmdd10.;
cards;
1A active 01012011 02012011 1 154
1A revoked 02012011 05012013 1 154
1A active 05012013 06032016 1 154
1B active 05032016 07012016 1 59
1B revoked 07012016 . 1 59
2A active 03012016 05012018 2 305
2A deactivated 05012018 06012018 2 305
2A active 06012018 . 2 305
3A active 12092003 09062016 3 344
3A deactivated 09062016 09272016 3 344
3A active 09272016 . 3 344
4A active 01032015 02012015 4 365
4A deactivated 02012015 05042015 4 365
4A active 05042015 03032017 4 365
4A deactivated 03032017 . 4 365
4B active 03032015 . 4 365
;
Thanks!!!
Best,
Gina
2 REPLIES 2
PGStats
Opal | Level 21

I modified the example data to create an enrolment gap on 02MAY2016 for ID=3. The following program will also cover the case where some enrolment periods would overlap. It is assumed that the data is sorted by ID.

 

data have;	
input ID (start end) (:mmddyy10.);
format start end yymmdd10.;
cards;
1	01012011	02012011
1	02012011	05012013
1	05012013	06032016
2	03012016	05012018
2	05012018	06012018
2	06012018	.
3	01012015	02012016
3	02012016	05012016    <- modified to create a 1 day gap
3	05032016	07012017
4	01032015	02012015
4	02012015	02042017
5	03032015    .
;

/* Convert period start and end dates to numbers for array indexing */
%let dstart=%sysfunc(mdy(01,01,2016));
%let dend=%sysfunc(mdy(12,31,2016));

data want;
array d {&dstart:&dend} _temporary_;
call missing(of d{*});
do until(last.ID);
    set have; by ID;
    do dt = max(start, &dstart) to min(end, &dend);
        d{dt} = 1;
        end;
    end;

if cmiss(of d{*}) = 0 then cat = 4;
else if missing(d{&dend}) then cat = 1;
else if missing(d{&dstart}) then cat = 2;
else cat = 3;

do until(last.ID);
    set have; by ID;
    output;
    end;
drop dt;
run;

proc print data=want noobs; run;
PG
ginak
Quartz | Level 8

Hello! Thank you for your help!!! This is so informative and interesting and I'm learning so much. I have an update though, and I'll update my original question as well.

 

What if the data look like this, where we determine active and not active based on the status variable?

data have;	
input ID status (start end) (:mmddyy10.);
format start end yymmdd10.;
cards;
1A	active 	01012011	02012011
1A	revoked 02012011	05012013
1A	active 	05012013	06032016
1B	active	05032016	07012016
1B  revoked	07012016	.
2A	active	03012016	05012018
2A	deactivated	05012018	06012018
2A	active	06012018	.
3A	active	12092003	09062016
3A	deactivated	09062016	09272016
3A	active 	09272016	.
4A	active	01032015	02012015
4A	deactivated 02012015	05042015
4A	active	05042015	03032017
4A	deactivated	03032017	.
4B	active	03032015    .
;

So if someone is revoked or deactivated, that means they are not active. Now:

  • ID 1A and 1B are category 1, "Suppliers whose enrollment ended sometime in 2016" (1A's last active end date is 06/03/2016, and 1B's last active end date is 07/01/2016.
  • 2A falls under "Suppliers whose enrollment began sometime in 2016" 03/01/2016,
  • 3A falls under "Suppliers that were enrolled at the start and end of 2016 but had 2016 enrollment gaps" as you can see on that they became deactivated from 9/6/2016-9/27/2016, and then active again on 9/27/2016.
  • 4A and 4B fall under "enrolled the entire calendar year of 2016"

I should mention that a missing in the end date means that they're ongoing. So for 4b, they were active on 3//2015, and have been active since then, since there is a "." for the last end date.

 

 

Is it difficult to modify your code to this? Thank you very much, apologies. I misunderstood the task 😞 

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 1470 views
  • 0 likes
  • 2 in conversation