BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
TC_
Obsidian | Level 7 TC_
Obsidian | Level 7

I want to simulate student attrition. I have a bunch of student currently enrolled, assume starting the 2nd semester, 10% of those who are still enrolled drop out each semester. So with 100 students, the number of students dropping out is 10, 9, 8, 7...

I want to create a variable (2,3,4...) indicating which semester the students drop out. This is the dataset I start with

data records;
	do ID=1 to 100;
	output;
	end;
run;

And I want to get results like this below (no randomness at all)

TC__0-1729750499932.png

 

I also need to apply this program to other scenarios where the attrition rate and number of students may differ. This is what I am doing and it works. 

%let attr_rate=0.10;
data proj3; set proj2 nobs=_nobs_; *_nobs_=number of rows; if _N_<=round(_nobs_*&attr_rate) then last_sem=2; else if _N_<=round(_nobs_*&attr_rate)+round(_nobs_*(1-&attr_rate)*&attr_rate) then last_sem=3; else if _N_<=round(_nobs_*&attr_rate)+round(_nobs_*(1-&attr_rate)*&attr_rate)+round(_nobs_*(1-&attr_rate)**2*&attr_rate) then last_sem=4; run;

But as you can see, the code gets really long as I cover more semesters. And I have up to 13 semesters to cover.

Is there a better and/or smarter way to achieve this?

 

Thank you very much!

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @TC_,

 

I would use the cumulative number of dropped out students:

%let r=0.1;   /* attrition rate */
%let n=100;   /* initial number of students */
%let sem1=2;  /* first semester considered */
%let semL=13; /* last semester considered */

data want;
ID=1;
do sem=&sem1 to &semL;
  do ID=ID to round((1-(1-&r)**(sem-&sem1+1))*&n);
    output;
  end;
end;
sem=.;
do ID=ID to &n; /* these never drop out */
  output;
end;
run;

View solution in original post

13 REPLIES 13
Ksharp
Super User

Very interesting question.

If I understood what you mean.

 




data records;
	do ID=1 to 100;
	output;
	end;
run;

data temp;
drop=2;n=10;  /*initial value*/
do until(n=0);
do i=1 to n;
 output;
end;
n+(-1);
drop+1;
end;
drop i;
run;

data want;
 merge records temp;
run;

proc freq data=want;
table drop;
run;

Ksharp_0-1729759772931.png

 

TC_
Obsidian | Level 7 TC_
Obsidian | Level 7

Thank you for responding, @Ksharp.

I guess I should have highlighted how I wanted to use the program at the beginning. I need to use the program in other scenarios where the attrition rate might differ. The example of 100 students with 10% dropout was given as an example to make things easier to follow. I might have, say, 57 students with 5% attrition rate; or 165 students with 7% attrition rate. What you provided is more like a way to generate a sequence of numbers with a specific frequency. 

Ksharp
Super User

OK. You need to change the initial value. Here is an example:

 

/*165 students with 7% attrition rate*/

data records;
	do ID=1 to 165;  /*165 students */
	output;
	end;
run;

data temp;
drop=2;n=12;  /* 12=ceil(165*0.07) */
do until(n=0);
do i=1 to n;
 output;
end;
n+(-1);
drop+1;
end;
drop i;
run;

data want;
 merge records temp;
run;

proc freq data=want;
table drop;
run;
FreelanceReinh
Jade | Level 19

Hello @TC_,

 

I would use the cumulative number of dropped out students:

%let r=0.1;   /* attrition rate */
%let n=100;   /* initial number of students */
%let sem1=2;  /* first semester considered */
%let semL=13; /* last semester considered */

data want;
ID=1;
do sem=&sem1 to &semL;
  do ID=ID to round((1-(1-&r)**(sem-&sem1+1))*&n);
    output;
  end;
end;
sem=.;
do ID=ID to &n; /* these never drop out */
  output;
end;
run;
TC_
Obsidian | Level 7 TC_
Obsidian | Level 7

Thank you very much, @FreelanceReinh. This solution is neat!

A follow-up question. If I want to generate the IDs first in one Data step, and assign sem in another Data step, is there a way to do so with a DO Loop? In other words, if I already have a dataset of students' info and want to assign a dropout semester to them, what's the best way to do so with a DO Loop?

A workaround I have is that I can use your code to create a new dataset, randomly sort the students' info dataset with rng, and merge the two without a BY variable. 

Thank you!

FreelanceReinh
Jade | Level 19

@TC_ wrote:

Thank you very much, @FreelanceReinh. This solution is neat!

A follow-up question. If I want to generate the IDs first in one Data step, and assign sem in another Data step, is there a way to do so with a DO Loop? In other words, if I already have a dataset of students' info and want to assign a dropout semester to them, what's the best way to do so with a DO Loop?

A workaround I have is that I can use your code to create a new dataset, randomly sort the students' info dataset with rng, and merge the two without a BY variable. 

Thank you!


You're welcome. One possible solution is to adapt the code as follows:

/* Create sample data for demonstration */

data have;
id=5;
do _n_=1 to 100;
  id=mod(67*id, 101);
  other_info=ranuni(1);
  output;
end;
run;

/* Assign hypothetical dropout semester */

%let r=0.1;   /* attrition rate */
%let sem1=2;  /* first semester considered */
%let semL=13; /* last semester considered */

data want;
if 0 then set have nobs=n;
do sem=&sem1 to &semL;
  do _n_=_n_ to round((1-(1-&r)**(sem-&sem1+1))*n);
    link rw;
  end;
end;
sem=.;
do _n_=_n_ to n; /* these never drop out */
  link rw;
end;
rw:
  set have;
  output;
run;
TC_
Obsidian | Level 7 TC_
Obsidian | Level 7
Awesome, thank you!
Patrick
Opal | Level 21

I interpreted your question a bit differently than @Ksharp and @FreelanceReinh (...which is daring...). 

I believe the formula to calculate the survival rate for a student after n semesters with a drop out probability p of 0.1 per semesters would be: S(n) = (1-p)n

 

I couldn't figure out a way that doesn't require a loop for calculating the drop out semester. Soo.... here I go:

%let n_semesters=13;
%let drop_probability=0.1;
data have;
  do semester=1 to &n_semesters;
    do i=1 to 10000;
      student_id=cats(semester,'_',i);
      output;
    end;
  end;
  drop i;
run;

data want;
  set have;
  if _n_=1 then call streaminit(1230497);

  drop_out_flg=0;
  do i=1 to semester;
    if rand('uniform')<=&drop_probability then
      do;
        drop_out_flg=1;
        drop_out_semester=i;
        leave;
      end;
  end;
run;

%let sv_missing=%sysfunc(getoption(missing,keyword));
options missing=' ';
title 'drop/no drop per semester';
proc tabulate data=want noseps missing;
  keylabel n=' ';
  class drop_out_flg drop_out_semester;
  table drop_out_semester all='Total', drop_out_flg;
run;
title;
options &sv_missing;

/* as a check: calculate survival for a student over all remaining semesters */
%let sv_missing=%sysfunc(getoption(missing,keyword));
options missing=' ';
data survival;
  set have;
  if _n_=1 then call streaminit(1230497);

  drop_out_flg=0;
  if rand('uniform')>(1-&drop_probability)**semester then drop_out_flg=1;
run;

title 'survival over all semesters';
proc tabulate data=survival noseps missing;
  keylabel n=' ';
  class drop_out_flg;
  table drop_out_flg;
run;
title;

Patrick_1-1729774333575.png

 

 

TC_
Obsidian | Level 7 TC_
Obsidian | Level 7

Thank you, @Patrick. I wanted to use random assignment as well, but my supervisor explicitly said not to do so at this point. Anyway, I like your random assignment. At this point, this is not what I wanted, but it will surely be helpful to keep in mind for my future work.

Thank you!

Tom
Super User Tom
Super User

Should be able to use the NOBS= option of the SET statement to know how many to put into each group.

%let rate=0.10 ;

data want;
  retain obs 1 semester 2;
  do obs=obs to obs+int(&rate*(nobs-obs));
    set records nobs=nobs;
    output;
  end;
  semester+1;
run;

Results:

Tom_0-1729786223877.pngTom_1-1729786251182.pngTom_2-1729786434498.png

 

TC_
Obsidian | Level 7 TC_
Obsidian | Level 7

Thank you, @Tom! This solution also gets the job done and is quite concise. I really wish I could accept two solutions. 

mkeintz
PROC Star

 

Read the entire dataset, using the NOBS= parameter with the drop rate to generate the  target drop count for the initial LAST_SEM value.  Then as the target for each semester is reached, update the target through the final eligible LAST_SEM, after which LAST_SEM is set to missing.

 

data have;
  do id=1 to 100; output; end;
run;

%let rate=.1;
%let beg_drop=2;
%let end_drop=10;

data want;
  retain last_sem &beg_drop _target 0 ;

  set have nobs=nrecs;
  if _n_=1 then _target=round(&rate * nrecs);
  output;

  if _n_= _target then do;
    last_sem+1;
    _target + round(&rate * (nrecs-_n_));
    if last_sem>&end_drop then call missing(last_sem,_target);
  end;
run;

 

 

Of course, this assumes that the nobs of HAVE and the dropout rate will generate at least one dropout in the first eligible time period.  If that is a possible issue, then:

 


data want;
  retain last_sem &beg_drop _target 0 ;

  set have nobs=nrecs;
  if _n_=1 then do;
    _target=round(&rate * nrecs);
    if _target=0 then call missing(last_sem,_target);
  end;
  output;

  if _n_= _target then do;
    last_sem+1;
    _target + round(&rate * (nrecs-_n_));
    if last_sem>&end_drop then call missing(last_sem,_target);
  end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
TC_
Obsidian | Level 7 TC_
Obsidian | Level 7
Thank you for your response and the explanation!

sas-innovate-white.png

Missed SAS Innovate in Orlando?

Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.

 

Register now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 1845 views
  • 0 likes
  • 6 in conversation