Solved: Re: New enrollment, retention, and disenrollment of clients

SAS_SB · Posted 02-01-2023 01:44 PM

Hello All, I am working on a dataset that has year and month of enrollment of clients into a program (from March 2018 to February 2022). The dataset does not have information on when the clients exited the program. I have been trying to figure out:

1. the number of clients newly enrolled from one year to the next (for e.g., present in 2019 but not in 2018)

2. the number of clients who exited the program in the next year (for e.g., present in 2018 but not in in 2019)

3. number of clients retained (for e.g., present in both 2018 and 2019).

I would appreciate any help! Thanks!

The data looks like:

data enrollment;
input id$ year month;
datalines;
1 2018 10
1 2018 11
1 2018 12
1 2019 1
1 2019 2
2 2019 10
2 2019 11
2 2019 12
3 2018 3
3 2018 4
3 2018 5
3 2019 1
;
run;

SASJedi · Posted 02-06-2023 08:46 AM

As the question evolved here, the name of your input data set keeps changing as do the input data set vales. It makes it hard to refined the answer from previous work. So I'm going to provide both sample data and commented code here to help you apply this to your actual data.

This code creates the input data set:

data prepost;
input id$ year month;
datalines;
1 2018 2
1 2018 3
1 2018 4
1 2019 2
1 2019 3
1 2019 4
2 2018 8
2 2018 9
2 2018 10
3 2018 3
3 2018 4
3 2018 5
3 2018 8
3 2018 9
3 2018 10
;

Which looks like this:

id	year	month
1	2018	2
1	2018	3
1	2018	4
1	2019	2
1	2019	3
1	2019	4
2	2018	8
2	2018	9
2	2018	10
3	2018	3
3	2018	4
3	2018	5
3	2018	8
3	2018	9
3	2018	10

This code does the summarization:

proc sql;
create table prepost_q1 as
select id
		  /* Each logical expression produces a 1 if true, 0 if false */
	   ,sum( month in (3,5))=0 /* Not enrolled first */ 
		 AND 
		 sum( month in (6,8))>0 /* Enrolled second */ as preq2_newenrol

	   ,sum( month in (3,5))>0 /* Enrolled first */ 
		 AND 
		 sum( month in (6,8))=0 /* Not enrolled second */ as preq2_exited

	   ,sum( month in (3,5))>0 /* Enrolled first */ 
		 AND 
		 sum( month in (6,8))>0 /* Enrolled second */ as preq2_retained
	from work.prepost
	where year=2018
	group by id
;
quit;

Producing this result:

id	preq2_newenrol	preq2_exited	preq2_retained
1	0	1	0
2	1	0	0
3	0	0	1

Check out my Jedi SAS Tricks for SAS Users

View solution in original post

PaigeMiller · Posted 02-01-2023 01:54 PM

@SAS_SB wrote:

1. the number of clients newly enrolled from one year to the next (for e.g., present in 2019 but not in 2018)

2. the number of clients who exited the program in the next year (for e.g., present in 2018 but not in in 2019)

3. number of clients retained (for e.g., present in both 2018 and 2019).

So based on your description, month is irrelevant?

Is this data representative of the real problem? In other words, the only two years we are discussing is 2018 and 2019, and so the only code we have to write is to check these two years? Or does the real problem contain more than two years? (If the latter, provide data that spans more than two years)

--
Paige Miller

SAS_SB · Posted 02-01-2023 04:45 PM

Thank you @PaigeMiller !

Months could be useful if we want to have a different/specific time period. For e.g., March 2018 to Feb 2019, March 2019 to Feb 2020, etc. In addition, clients might not be in the program for all months of the year.

The example provided was just a truncated dataset. I have other years as well. Please find the revised example dataset as follows. Please let me know if this example is sufficient.

Please note that I have put dots (.) to indicate continuation of the months enrolled within the given year not the missing values.

data work.enrollment;
input id$ year month;
datalines;
1 2018 10
1 2018 11
1 2018 12
1 2019 1
1 2019 2
2 2019 10
2 2019 11
2 2019 12
3 2018 3
3 2018 4
3 2018 5
3 2019 1
3 2019 12
3 2020 1
3 2020 2
4 2018 9
4 2018 10
4 2018 11
4 2018 12
4 2019 1
4 2019 2
5 2018 12
5 2019 1
5 2019 2
. .... .
5 2019 12
5 2020 1
6 2019 1
. .... .
6 2019 12
6 2020 1
. .... .
6 2020 12
6 2021 1
. .... .
6 2021 12
6 2022 1
6 2022 2
;
run;

PaigeMiller · Posted 02-01-2023 04:52 PM

So continuing on ... I'm still not sure I understand what you want.

If a client is enrolled in January, 2018 but not the rest of the year, and is enrolled in December, 2019 but not the rest of the year ... how is that treated. Is it new enrollment, is it disenrollment, is it retention?

Also, I don't like this:

Months could be useful...

Either we are supposed to be using months in the logic and code, or we are not.

--
Paige Miller

SAS_SB · Posted 02-01-2023 05:23 PM

Sorry for being unclear.

So I would like to have the number of individuals with new enrollment, retention, and disenrollment for the periods: from period 1 (March 2018 to Feb 2019) to period 2 (March 2019 to Feb 2020) and from period 2 to period 3 (March 2020 to Feb 2021). So, how many were retained in period 2 from period 1? How many were disenrolled when moving from period 1 to period 2? etc.

"If a client is enrolled in January, 2018 but not the rest of the year, and is enrolled in December, 2019 but not the rest of the year ... how is that treated. Is it new enrollment, is it disenrollment, is it retention?"

For my case it would be retention.

Yes we will use months in the code.

Thanks, again!

PaigeMiller · Posted 02-02-2023 06:37 AM

I think I understand now, but again I ask for a more representative data set. In particular, I cannot work with data that has dots in it like this:

Right now, there are still only two time periods represented in the part of the data I can use, IDs 1 through 4, those two time periods being March 2018 to Feb 2019 and March 2019 to Feb 2020.

--
Paige Miller

SAS_SB · Posted 02-03-2023 10:15 AM

Here is the revised dataset.

Many thanks in advance!

data work.enrollment;
input id$ year month;
datalines;
1 2018 10
1 2018 11
1 2018 12
1 2019 1
1 2019 2
2 2019 10
2 2019 11
2 2019 12
3 2018 3
3 2018 4
3 2018 5
3 2019 1
3 2019 12
3 2020 1
3 2020 2
4 2018 9
4 2018 10
4 2018 11
4 2018 12
4 2019 1
4 2019 2
5 2018 12
5 2019 1
5 2019 2
5 2019 3
5 2019 4
5 2019 5
5 2019 6
5 2019 7
5 2019 8
5 2019 9
5 2019 10
5 2019 11
5 2019 12
5 2020 1
5 2020 2
6 2019 1
6 2019 2
6 2019 3
6 2019 4
6 2019 5
6 2019 6
6 2019 7
6 2019 8
6 2019 9
6 2019 10
6 2019 11
6 2019 12
6 2020 1
6 2020 2
6 2020 3
6 2020 4
6 2020 5
6 2020 6
6 2020 7
6 2020 8
6 2020 9
6 2020 10
6 2020 11
6 2020 12
6 2021 1
6 2021 2
6 2021 3
6 2021 4
6 2021 5
6 2021 6
6 2021 7
6 2021 8
6 2021 19
6 2021 10
6 2021 11
6 2021 12
6 2022 1
6 2022 2
;
run;

PaigeMiller · Posted 02-03-2023 10:55 AM

Thanks. This is what I have, it works for this data

data enrollment1;
    set enrollment;
    yymm=mdy(month,1,year);
    last_month_of_time_period=intnx('year.3',yymm,0,'e');
    year_of_enrollment=year(last_month_of_time_period)-1;
    enrolled=1;
    format yymm last_month_of_time_period yymmn6.;
run;
data enrollment2;
    set enrollment1;
    by id year_of_enrollment;
    if last.year_of_enrollment;
run;
proc transpose data=enrollment2 out=enrollment3 prefix=yr_;
    by id;
    var enrolled; 
    id year_of_enrollment;
run;

data want;
    set enrollment3(drop=_name_);
    array yr yr_2018-yr_2020;
    array new newenrollment_2018-newenrollment_2020;
    array dis disenrollment_2018-disenrollment_2020;
    array ret retention_2018-retention_2020;
    do i=1 to dim(yr);
        ret(i)=0;
        dis(i)=0;
        new(i)=0;
        if i=1 and yr(i)=1 then new(i)=1;
        if i>1 then do;
            if yr(i)=1 and yr(i-1)=. then new(i)=1;
            else if yr(i)=. and yr(i-1)=1 then dis(i)=1;
            else if yr(i)=1 and yr(i-1)=1 then ret(i)=1;
        end;
    end;
    drop i;
run;
proc summary data=want;
    var new: dis: ret:;
    output out=percents mean=;
run;

--
Paige Miller

SAS_SB · Posted 02-03-2023 02:58 PM

Thank you @PaigeMiller !

I am a beginner learning SAS. So, it will take some time for me to understand the code. I am learning the different SAS statements and options. Meanwhile, if there is any simpler (for learners like me!), it would be super helpful.

Many thanks!

PaigeMiller · Posted 02-03-2023 03:03 PM

There may be simpler ways, but I have written it relatively step by step, which might make it easier to understand.

The first data set named ENROLLMENT1 computes the last month of the time period (which I think is always Feb) and then finds the corresponding year (so Feb 2019 is really part of the 2018 year)

The second data set named ENROLLMENT2 winnows the data set down to just one record per time period

The third data set ENROLLMENT3 is transpose of ENROLLMENT2, you should look at it to see what I have done. Now that the data is in this form, a series of IF statement in WANT computes wherther or not the record was new enrollment, disenrollment or retention for each year. And then finally PROC SUMMARY computes to percent of each, by year.

--
Paige Miller

SAS_SB · Posted 02-03-2023 03:19 PM

Thank you @PaigeMiller. The explanation is helpful.

One quick question, what would be the codes if the end month is other than February? I would like to have several quarters as time periods. For example, March to May 2018 as quarter 1 (q1), June to August 2018 as q2, September to November 2018 as q3, and so on. What is the number of unique clients newly enrolled, exited, and retained from q1 to q2, q2 to q3, and so on?

Thank you, again!

PaigeMiller · Posted 02-03-2023 03:39 PM

This would require somewhat of a re-write of the code, and I don't have the time right now to do this.

--
Paige Miller

SAS_SB · Posted 02-03-2023 03:44 PM

No worries! When you get time, please write down the code. I will be keeping an eye for that.

Thank you!

SAS_SB · Posted 02-01-2023 11:50 AM

Hello All,

I am working on a dataset that has year and month of enrollment of clients into a program (from March 2018 to February 2022). The dataset does not have information on when the clients exited the program. I have been trying to figure out:

1. the number of clients newly enrolled from one year to the next (for e.g., present in 2019 but not in 2018)

2. the number of clients who exited the program in the next year (for e.g., present in 2018 but not in in 2019)

3. number of clients retained (for e.g., present in both 2018 and 2019).

I would appreciate any help!

Thanks!

The data looks like:

data enrollment;
input id$ year month;
datalines;
1 2018 10
1 2018 11
1 2018 12
1 2019 1
1 2019 2
2 2019 10
2 2019 11
2 2019 12
3 2018 3
3 2018 4
3 2018 5
3 2019 1
;
run;

SASJedi · Posted 02-02-2023 10:38 AM

With SQL, a little boolean logic can create numeric flags for each ID:

proc sql;
select ID
	   , SUM(year=2018)=0 and SUM(year=2019)>0 as NewEnrollment
	   , SUM(year=2018)>0 and SUM(year=2019)=0 as Exited
       , SUM(year=2018)>0 and SUM(year=2019)>0 as Retained
	from enrollment
	group by ID
;
quit;

Producing a report like this:

id	NewEnrollment	Exited	Retained
1	0	1	0
2	1	0	0
3	0	0	1

Which can be used as an inline view to get the overall report:

proc sql;
select sum(NewEnrollment) as TotalNew
	   ,sum(Exited) as TotalExited
	   ,sum(Retained) as TotalRetained
	   from (select ID
					, SUM(year=2018)=0 and SUM(year=2019)>0 as NewEnrollment
					, SUM(year=2018)>0 and SUM(year=2019)=0 as Exited
					, SUM(year=2018)>0 and SUM(year=2019)>0 as Retained
					from enrollment
					group by ID)
;
quit;

and, violá!

TotalNew	TotalExited	TotalRetained
1	1	1

Check out my Jedi SAS Tricks for SAS Users

New enrollment, retention, and disenrollment of clients

Re: Clients retained, clients exited, and new clients added

Re: New enrollment, retention, and disenrollment of clients

Re: New enrollment, retention, and disenrollment of clients

Re: New enrollment, retention, and disenrollment of clients

Re: New enrollment, retention, and disenrollment of clients

Re: New enrollment, retention, and disenrollment of clients

Re: New enrollment, retention, and disenrollment of clients

Re: New enrollment, retention, and disenrollment of clients

Re: New enrollment, retention, and disenrollment of clients

Re: New enrollment, retention, and disenrollment of clients

Re: New enrollment, retention, and disenrollment of clients

Re: New enrollment, retention, and disenrollment of clients

Re: New enrollment, retention, and disenrollment of clients

Clients retained, clients exited, and new clients added

Re: Clients retained, clients exited, and new clients added

The 2025 SAS Hackathon has begun!

SAS Training: Just a Click Away