BookmarkSubscribeRSS Feed
NKormanik
Barite | Level 11

Suppose you are part of government grant research into recidivism -- keeping folks from returning to jail.

 

You are given access to oodles of data. Every which way on prisoners. Demographics. Jobs. Race. How soon people end up back in jail. Etc.

 

Your assignment is to come up with variables that appear to maximize the probability of an individual staying out of the slammer, versus coming back to the clinker. Say, within the coming year.

 

What SAS procedures are best suited for such assignment?

 

Logistic Regression?

Other?

 

Thanks much. All thoughts greatly appreciated.

 

Nicholas Kormanik

 

6 REPLIES 6
StatDave
SAS Super FREQ

Yes, logistic regression is a possible approach with such data, assuming that the data contains all the variables you mention for individuals who both return to and don't return to prison. The obvious procedure is PROC LOGISTIC, but there are others available depending on the exact nature of the model to fit. See this note.

sbxkoenk
SAS Super FREQ

Hello,

 

Beyond logistic regression you could look at many other models that support binary targets.

Maybe the broad family of decision trees (single decision trees, gradient boosting, forests, …) is worth looking at.

 

In failure prediction (manufacturing) I sometimes use techniques for Analysis of Recurrent Events Data. SAS/Stat and SAS/QC have some beautiful algorithms for that but I think it's not best suited here (unless you want to analyze recidivism / repeat offence throughout the whole lifetime. As it can happen frequently of course, to the same individual.

 

Good luck,

Koen

NKormanik
Barite | Level 11

Grealy appreciated leads @StatDave and @sbxkoenk 

 

The outcome variable column is either a blank or a return date.  I've converted that to binary -- 0, 1.

 

Ideally one could use the information of how long the recidivist is able to stay out of jail.  But for now I'm simply looking at YES or NO.

 

 

Tom
Super User Tom
Super User

Do you have follow-up data for non-recidivists (that is do you know how long they have NOT gone back to jail?)

If so then you have censored data.  So use survival analysis.

NKormanik
Barite | Level 11

Thanks @Tom 

 

Staying out of jail would be indicated by nothing in the returned column.  They've stayed out since when they were last in.

 

For those returning to jail, there is a return date in that column.

 

Tom
Super User Tom
Super User

@NKormanik wrote:

Thanks @Tom 

 

Staying out of jail would be indicated by nothing in the returned column.  They've stayed out since when they were last in.

 

For those returning to jail, there is a return date in that column.

 


I wasn't really talking about how you currently have the data coded, but what information you have.  For example do you have death data for this people?  They are no longer at risk of being sent to jail after they are dead.  If they have migrated to Australia then they are very unlikely to be sent to jail in the US.  etc.  Do you have positive information that they are not in jail by a specific date?  What if they are in jail in a system that did not report it to your database?

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 926 views
  • 6 likes
  • 4 in conversation