BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SAS_SB
Obsidian | Level 7

Hello All,   I am working on a dataset that has year and month of enrollment of clients into a program (from March 2018 to February 2022). The dataset does not have information on when the clients exited the program. I have been trying to figure out:

1. the number of clients newly enrolled from one year to the next (for e.g., present in 2019 but not in 2018)

2. the number of clients who exited the program in the next year (for e.g., present in 2018 but not in in 2019)

3. number of clients retained (for e.g., present in both 2018 and 2019).

 

I would appreciate any help! Thanks!

 

The data looks like: 

data enrollment;
input id$ year month;
datalines;
1 2018 10
1 2018 11
1 2018 12
1 2019 1
1 2019 2
2 2019 10
2 2019 11
2 2019 12
3 2018 3
3 2018 4
3 2018 5
3 2019 1
;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
SASJedi
Ammonite | Level 13

As the question evolved here, the name of your input data set keeps changing as do the input data set vales. It makes it hard to refined the answer from previous work. So I'm going to provide both sample data and commented code here to help you apply this to your actual data. 

This code creates the input data set:

data prepost;
input id$ year month;
datalines;
1 2018 2
1 2018 3
1 2018 4
1 2019 2
1 2019 3
1 2019 4
2 2018 8
2 2018 9
2 2018 10
3 2018 3
3 2018 4
3 2018 5
3 2018 8
3 2018 9
3 2018 10
;

 Which looks like this:

id year month
1 2018 2
1 2018 3
1 2018 4
1 2019 2
1 2019 3
1 2019 4
2 2018 8
2 2018 9
2 2018 10
3 2018 3
3 2018 4
3 2018 5
3 2018 8
3 2018 9
3 2018 10

This code does the summarization:

proc sql;
create table prepost_q1 as
select id
		  /* Each logical expression produces a 1 if true, 0 if false */
	   ,sum( month in (3,5))=0 /* Not enrolled first */ 
		 AND 
		 sum( month in (6,8))>0 /* Enrolled second */ as preq2_newenrol

	   ,sum( month in (3,5))>0 /* Enrolled first */ 
		 AND 
		 sum( month in (6,8))=0 /* Not enrolled second */ as preq2_exited

	   ,sum( month in (3,5))>0 /* Enrolled first */ 
		 AND 
		 sum( month in (6,8))>0 /* Enrolled second */ as preq2_retained
	from work.prepost
	where year=2018
	group by id
;
quit; 

Producing this result:

id preq2_newenrol preq2_exited preq2_retained
1 0 1 0
2 1 0 0
3 0 0 1

 

 

 

 
Check out my Jedi SAS Tricks for SAS Users

View solution in original post

28 REPLIES 28
PaigeMiller
Diamond | Level 26

@SAS_SB wrote:

 

1. the number of clients newly enrolled from one year to the next (for e.g., present in 2019 but not in 2018)

2. the number of clients who exited the program in the next year (for e.g., present in 2018 but not in in 2019)

3. number of clients retained (for e.g., present in both 2018 and 2019).


So based on your description, month is irrelevant?

 

Is this data representative of the real problem? In other words, the only two years we are discussing is 2018 and 2019, and so the only code we have to write is to check these two years? Or does the real problem contain more than two years? (If the latter, provide data that spans more than two years)

 

 

--
Paige Miller
SAS_SB
Obsidian | Level 7

Thank you @PaigeMiller !

Months could be useful if we want to have a different/specific time period. For e.g., March 2018 to Feb 2019, March 2019 to Feb 2020, etc. In addition, clients might not be in the program for all months of the year. 

The example provided was just a truncated dataset. I have other years as well. Please find the revised example dataset as follows. Please let me know if this example is sufficient. 

 

Please note that I have put dots (.) to indicate continuation of the months enrolled within the given year not the missing values. 

 

data work.enrollment;
input id$ year month;
datalines;
1 2018 10
1 2018 11
1 2018 12
1 2019 1
1 2019 2
2 2019 10
2 2019 11
2 2019 12
3 2018 3
3 2018 4
3 2018 5
3 2019 1
3 2019 12
3 2020 1
3 2020 2
4 2018 9
4 2018 10
4 2018 11
4 2018 12
4 2019 1
4 2019 2
5 2018 12
5 2019 1
5 2019 2
. .... .
5 2019 12
5 2020 1
6 2019 1
. .... .
6 2019 12
6 2020 1
. .... .
6 2020 12
6 2021 1
. .... .
6 2021 12
6 2022 1
6 2022 2
;
run;

 

 

 

 

PaigeMiller
Diamond | Level 26

So continuing on ... I'm still not sure I understand what you want.

 

If a client is enrolled in January, 2018 but not the rest of the year, and is enrolled in December, 2019 but not the rest of the year ... how is that treated. Is it new enrollment, is it disenrollment, is it retention?

 

Also, I don't like this:

 

Months could be useful...

 

Either we are supposed to be using months in the logic and code, or we are not.

--
Paige Miller
SAS_SB
Obsidian | Level 7

Sorry for being unclear. 

So I would like to have the number of individuals with new enrollment, retention, and disenrollment for the periods: from period 1 (March 2018 to Feb 2019) to period 2 (March 2019 to Feb 2020) and from period 2 to period 3 (March 2020 to Feb 2021). So, how many were retained in period 2 from period 1? How many were disenrolled when moving from period 1 to period 2? etc.

 

"If a client is enrolled in January, 2018 but not the rest of the year, and is enrolled in December, 2019 but not the rest of the year ... how is that treated. Is it new enrollment, is it disenrollment, is it retention?" 

For my case it would be retention.

 

Yes we will use months in the code. 

 

Thanks, again!

 

PaigeMiller
Diamond | Level 26

I think I understand now, but again I ask for a more representative data set. In particular, I cannot work with data that has dots in it like this:

 

5 2020 1
6 2019 1
. .... .
6 2019 12
6 2020 1

 

Right now, there are still only two time periods represented in the part of the data I can use, IDs 1 through 4, those two time periods being March 2018 to Feb 2019 and March 2019 to Feb 2020.

--
Paige Miller
SAS_SB
Obsidian | Level 7

Here is the revised dataset.

 

Many thanks in advance!

data work.enrollment;
input id$ year month;
datalines;
1 2018 10
1 2018 11
1 2018 12
1 2019 1
1 2019 2
2 2019 10
2 2019 11
2 2019 12
3 2018 3
3 2018 4
3 2018 5
3 2019 1
3 2019 12
3 2020 1
3 2020 2
4 2018 9
4 2018 10
4 2018 11
4 2018 12
4 2019 1
4 2019 2
5 2018 12
5 2019 1
5 2019 2
5 2019 3
5 2019 4
5 2019 5
5 2019 6
5 2019 7
5 2019 8
5 2019 9
5 2019 10
5 2019 11
5 2019 12
5 2020 1
5 2020 2
6 2019 1
6 2019 2
6 2019 3
6 2019 4
6 2019 5
6 2019 6
6 2019 7
6 2019 8
6 2019 9
6 2019 10
6 2019 11
6 2019 12
6 2020 1
6 2020 2
6 2020 3
6 2020 4
6 2020 5
6 2020 6
6 2020 7
6 2020 8
6 2020 9
6 2020 10
6 2020 11
6 2020 12
6 2021 1
6 2021 2
6 2021 3
6 2021 4
6 2021 5
6 2021 6
6 2021 7
6 2021 8
6 2021 19
6 2021 10
6 2021 11
6 2021 12
6 2022 1
6 2022 2
;
run;
PaigeMiller
Diamond | Level 26

Thanks. This is what I have, it works for this data

 

data enrollment1;
    set enrollment;
    yymm=mdy(month,1,year);
    last_month_of_time_period=intnx('year.3',yymm,0,'e');
    year_of_enrollment=year(last_month_of_time_period)-1;
    enrolled=1;
    format yymm last_month_of_time_period yymmn6.;
run;
data enrollment2;
    set enrollment1;
    by id year_of_enrollment;
    if last.year_of_enrollment;
run;
proc transpose data=enrollment2 out=enrollment3 prefix=yr_;
    by id;
    var enrolled; 
    id year_of_enrollment;
run;

data want;
    set enrollment3(drop=_name_);
    array yr yr_2018-yr_2020;
    array new newenrollment_2018-newenrollment_2020;
    array dis disenrollment_2018-disenrollment_2020;
    array ret retention_2018-retention_2020;
    do i=1 to dim(yr);
        ret(i)=0;
        dis(i)=0;
        new(i)=0;
        if i=1 and yr(i)=1 then new(i)=1;
        if i>1 then do;
            if yr(i)=1 and yr(i-1)=. then new(i)=1;
            else if yr(i)=. and yr(i-1)=1 then dis(i)=1;
            else if yr(i)=1 and yr(i-1)=1 then ret(i)=1;
        end;
    end;
    drop i;
run;
proc summary data=want;
    var new: dis: ret:;
    output out=percents mean=;
run;
--
Paige Miller
SAS_SB
Obsidian | Level 7

Thank you @PaigeMiller ! 

 

I am a beginner learning SAS. So, it will take some time for me to understand the code. I am learning the different SAS statements and options. Meanwhile, if there is any simpler (for learners like me!), it would be super helpful.

 

Many thanks!  

PaigeMiller
Diamond | Level 26

There may be simpler ways, but I have written it relatively step by step, which might make it easier to understand.


The first data set named ENROLLMENT1 computes the last month of the time period (which I think is always Feb) and then finds the corresponding year (so Feb 2019 is really part of the 2018 year)

 

The second data set named ENROLLMENT2 winnows the data set down to just one record per time period

 

The third data set ENROLLMENT3 is transpose of ENROLLMENT2, you should look at it to see what I have done. Now that the data is in this form, a series of IF statement in WANT computes wherther or not the record was new enrollment, disenrollment or retention for each year. And then finally PROC SUMMARY computes to percent of each, by year.

--
Paige Miller
SAS_SB
Obsidian | Level 7

Thank you @PaigeMiller. The explanation is helpful.

 

One quick question, what would be the codes if the end month is other than February? I would like to have several quarters as time periods. For example, March to May 2018 as quarter 1 (q1), June to August 2018 as q2, September to November 2018 as q3, and so on. What is the number of unique clients newly enrolled, exited, and retained from q1 to q2, q2 to q3, and so on?

 

Thank you, again!

PaigeMiller
Diamond | Level 26

This would require somewhat of a re-write of the code, and I don't have the time right now to do this.

--
Paige Miller
SAS_SB
Obsidian | Level 7

No worries! When you get time, please write down the code. I will be keeping an eye for that.

 

Thank you!

SAS_SB
Obsidian | Level 7

Hello All,

 

I am working on a dataset that has year and month of enrollment of clients into a program (from March 2018 to February 2022). The dataset does not have information on when the clients exited the program. I have been trying to figure out:

1. the number of clients newly enrolled from one year to the next (for e.g., present in 2019 but not in 2018)

2. the number of clients who exited the program in the next year (for e.g., present in 2018 but not in in 2019)

3. number of clients retained (for e.g., present in both 2018 and 2019).

I would appreciate any help!

Thanks!

The data looks like:

data enrollment;
input id$ year month;
datalines;
1 2018 10
1 2018 11
1 2018 12
1 2019 1
1 2019 2
2 2019 10
2 2019 11
2 2019 12
3 2018 3
3 2018 4
3 2018 5
3 2019 1
;
run;  

 

SASJedi
Ammonite | Level 13

With SQL, a little boolean logic can create numeric flags for each ID:

proc sql;
select ID
	   , SUM(year=2018)=0 and SUM(year=2019)>0 as NewEnrollment
	   , SUM(year=2018)>0 and SUM(year=2019)=0 as Exited
       , SUM(year=2018)>0 and SUM(year=2019)>0 as Retained
from enrollment group by ID ; quit;

Producing a report like this:

id NewEnrollment Exited Retained
1 0 1 0
2 1 0 0
3 0 0 1

 

Which can be used as an inline view to get the overall report:

proc sql;
select sum(NewEnrollment) as TotalNew
	   ,sum(Exited) as TotalExited
	   ,sum(Retained) as TotalRetained
	   from (select ID
					, SUM(year=2018)=0 and SUM(year=2019)>0 as NewEnrollment
					, SUM(year=2018)>0 and SUM(year=2019)=0 as Exited
					, SUM(year=2018)>0 and SUM(year=2019)>0 as Retained
					from enrollment
					group by ID)
;
quit;

and, violá!

TotalNew TotalExited TotalRetained
1 1 1
Check out my Jedi SAS Tricks for SAS Users

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 28 replies
  • 5401 views
  • 6 likes
  • 4 in conversation