BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
TC_
Obsidian | Level 7 TC_
Obsidian | Level 7

I want to simulate student attrition. I have a bunch of student currently enrolled, assume starting the 2nd semester, 10% of those who are still enrolled drop out each semester. So with 100 students, the number of students dropping out is 10, 9, 8, 7...

I want to create a variable (2,3,4...) indicating which semester the students drop out. This is the dataset I start with

data records;
	do ID=1 to 100;
	output;
	end;
run;

And I want to get results like this below (no randomness at all)

TC__0-1729750499932.png

 

I also need to apply this program to other scenarios where the attrition rate and number of students may differ. This is what I am doing and it works. 

%let attr_rate=0.10;
data proj3; set proj2 nobs=_nobs_; *_nobs_=number of rows; if _N_<=round(_nobs_*&attr_rate) then last_sem=2; else if _N_<=round(_nobs_*&attr_rate)+round(_nobs_*(1-&attr_rate)*&attr_rate) then last_sem=3; else if _N_<=round(_nobs_*&attr_rate)+round(_nobs_*(1-&attr_rate)*&attr_rate)+round(_nobs_*(1-&attr_rate)**2*&attr_rate) then last_sem=4; run;

But as you can see, the code gets really long as I cover more semesters. And I have up to 13 semesters to cover.

Is there a better and/or smarter way to achieve this?

 

Thank you very much!

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @TC_,

 

I would use the cumulative number of dropped out students:

%let r=0.1;   /* attrition rate */
%let n=100;   /* initial number of students */
%let sem1=2;  /* first semester considered */
%let semL=13; /* last semester considered */

data want;
ID=1;
do sem=&sem1 to &semL;
  do ID=ID to round((1-(1-&r)**(sem-&sem1+1))*&n);
    output;
  end;
end;
sem=.;
do ID=ID to &n; /* these never drop out */
  output;
end;
run;

View solution in original post

13 REPLIES 13
Ksharp
Super User

Very interesting question.

If I understood what you mean.

 




data records;
	do ID=1 to 100;
	output;
	end;
run;

data temp;
drop=2;n=10;  /*initial value*/
do until(n=0);
do i=1 to n;
 output;
end;
n+(-1);
drop+1;
end;
drop i;
run;

data want;
 merge records temp;
run;

proc freq data=want;
table drop;
run;

Ksharp_0-1729759772931.png

 

TC_
Obsidian | Level 7 TC_
Obsidian | Level 7

Thank you for responding, @Ksharp.

I guess I should have highlighted how I wanted to use the program at the beginning. I need to use the program in other scenarios where the attrition rate might differ. The example of 100 students with 10% dropout was given as an example to make things easier to follow. I might have, say, 57 students with 5% attrition rate; or 165 students with 7% attrition rate. What you provided is more like a way to generate a sequence of numbers with a specific frequency. 

Ksharp
Super User

OK. You need to change the initial value. Here is an example:

 

/*165 students with 7% attrition rate*/

data records;
	do ID=1 to 165;  /*165 students */
	output;
	end;
run;

data temp;
drop=2;n=12;  /* 12=ceil(165*0.07) */
do until(n=0);
do i=1 to n;
 output;
end;
n+(-1);
drop+1;
end;
drop i;
run;

data want;
 merge records temp;
run;

proc freq data=want;
table drop;
run;
FreelanceReinh
Jade | Level 19

Hello @TC_,

 

I would use the cumulative number of dropped out students:

%let r=0.1;   /* attrition rate */
%let n=100;   /* initial number of students */
%let sem1=2;  /* first semester considered */
%let semL=13; /* last semester considered */

data want;
ID=1;
do sem=&sem1 to &semL;
  do ID=ID to round((1-(1-&r)**(sem-&sem1+1))*&n);
    output;
  end;
end;
sem=.;
do ID=ID to &n; /* these never drop out */
  output;
end;
run;
TC_
Obsidian | Level 7 TC_
Obsidian | Level 7

Thank you very much, @FreelanceReinh. This solution is neat!

A follow-up question. If I want to generate the IDs first in one Data step, and assign sem in another Data step, is there a way to do so with a DO Loop? In other words, if I already have a dataset of students' info and want to assign a dropout semester to them, what's the best way to do so with a DO Loop?

A workaround I have is that I can use your code to create a new dataset, randomly sort the students' info dataset with rng, and merge the two without a BY variable. 

Thank you!

FreelanceReinh
Jade | Level 19

@TC_ wrote:

Thank you very much, @FreelanceReinh. This solution is neat!

A follow-up question. If I want to generate the IDs first in one Data step, and assign sem in another Data step, is there a way to do so with a DO Loop? In other words, if I already have a dataset of students' info and want to assign a dropout semester to them, what's the best way to do so with a DO Loop?

A workaround I have is that I can use your code to create a new dataset, randomly sort the students' info dataset with rng, and merge the two without a BY variable. 

Thank you!


You're welcome. One possible solution is to adapt the code as follows:

/* Create sample data for demonstration */

data have;
id=5;
do _n_=1 to 100;
  id=mod(67*id, 101);
  other_info=ranuni(1);
  output;
end;
run;

/* Assign hypothetical dropout semester */

%let r=0.1;   /* attrition rate */
%let sem1=2;  /* first semester considered */
%let semL=13; /* last semester considered */

data want;
if 0 then set have nobs=n;
do sem=&sem1 to &semL;
  do _n_=_n_ to round((1-(1-&r)**(sem-&sem1+1))*n);
    link rw;
  end;
end;
sem=.;
do _n_=_n_ to n; /* these never drop out */
  link rw;
end;
rw:
  set have;
  output;
run;
TC_
Obsidian | Level 7 TC_
Obsidian | Level 7
Awesome, thank you!
Patrick
Opal | Level 21

I interpreted your question a bit differently than @Ksharp and @FreelanceReinh (...which is daring...). 

I believe the formula to calculate the survival rate for a student after n semesters with a drop out probability p of 0.1 per semesters would be: S(n) = (1-p)n

 

I couldn't figure out a way that doesn't require a loop for calculating the drop out semester. Soo.... here I go:

%let n_semesters=13;
%let drop_probability=0.1;
data have;
  do semester=1 to &n_semesters;
    do i=1 to 10000;
      student_id=cats(semester,'_',i);
      output;
    end;
  end;
  drop i;
run;

data want;
  set have;
  if _n_=1 then call streaminit(1230497);

  drop_out_flg=0;
  do i=1 to semester;
    if rand('uniform')<=&drop_probability then
      do;
        drop_out_flg=1;
        drop_out_semester=i;
        leave;
      end;
  end;
run;

%let sv_missing=%sysfunc(getoption(missing,keyword));
options missing=' ';
title 'drop/no drop per semester';
proc tabulate data=want noseps missing;
  keylabel n=' ';
  class drop_out_flg drop_out_semester;
  table drop_out_semester all='Total', drop_out_flg;
run;
title;
options &sv_missing;

/* as a check: calculate survival for a student over all remaining semesters */
%let sv_missing=%sysfunc(getoption(missing,keyword));
options missing=' ';
data survival;
  set have;
  if _n_=1 then call streaminit(1230497);

  drop_out_flg=0;
  if rand('uniform')>(1-&drop_probability)**semester then drop_out_flg=1;
run;

title 'survival over all semesters';
proc tabulate data=survival noseps missing;
  keylabel n=' ';
  class drop_out_flg;
  table drop_out_flg;
run;
title;

Patrick_1-1729774333575.png

 

 

TC_
Obsidian | Level 7 TC_
Obsidian | Level 7

Thank you, @Patrick. I wanted to use random assignment as well, but my supervisor explicitly said not to do so at this point. Anyway, I like your random assignment. At this point, this is not what I wanted, but it will surely be helpful to keep in mind for my future work.

Thank you!

Tom
Super User Tom
Super User

Should be able to use the NOBS= option of the SET statement to know how many to put into each group.

%let rate=0.10 ;

data want;
  retain obs 1 semester 2;
  do obs=obs to obs+int(&rate*(nobs-obs));
    set records nobs=nobs;
    output;
  end;
  semester+1;
run;

Results:

Tom_0-1729786223877.pngTom_1-1729786251182.pngTom_2-1729786434498.png

 

TC_
Obsidian | Level 7 TC_
Obsidian | Level 7

Thank you, @Tom! This solution also gets the job done and is quite concise. I really wish I could accept two solutions. 

mkeintz
PROC Star

 

Read the entire dataset, using the NOBS= parameter with the drop rate to generate the  target drop count for the initial LAST_SEM value.  Then as the target for each semester is reached, update the target through the final eligible LAST_SEM, after which LAST_SEM is set to missing.

 

data have;
  do id=1 to 100; output; end;
run;

%let rate=.1;
%let beg_drop=2;
%let end_drop=10;

data want;
  retain last_sem &beg_drop _target 0 ;

  set have nobs=nrecs;
  if _n_=1 then _target=round(&rate * nrecs);
  output;

  if _n_= _target then do;
    last_sem+1;
    _target + round(&rate * (nrecs-_n_));
    if last_sem>&end_drop then call missing(last_sem,_target);
  end;
run;

 

 

Of course, this assumes that the nobs of HAVE and the dropout rate will generate at least one dropout in the first eligible time period.  If that is a possible issue, then:

 


data want;
  retain last_sem &beg_drop _target 0 ;

  set have nobs=nrecs;
  if _n_=1 then do;
    _target=round(&rate * nrecs);
    if _target=0 then call missing(last_sem,_target);
  end;
  output;

  if _n_= _target then do;
    last_sem+1;
    _target + round(&rate * (nrecs-_n_));
    if last_sem>&end_drop then call missing(last_sem,_target);
  end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
TC_
Obsidian | Level 7 TC_
Obsidian | Level 7
Thank you for your response and the explanation!

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 2724 views
  • 0 likes
  • 6 in conversation