Suppose you are part of government grant research into recidivism -- keeping folks from returning to jail.
You are given access to oodles of data. Every which way on prisoners. Demographics. Jobs. Race. How soon people end up back in jail. Etc.
Your assignment is to come up with variables that appear to maximize the probability of an individual staying out of the slammer, versus coming back to the clinker. Say, within the coming year.
What SAS procedures are best suited for such assignment?
Logistic Regression?
Other?
Thanks much. All thoughts greatly appreciated.
Nicholas Kormanik
Yes, logistic regression is a possible approach with such data, assuming that the data contains all the variables you mention for individuals who both return to and don't return to prison. The obvious procedure is PROC LOGISTIC, but there are others available depending on the exact nature of the model to fit. See this note.
Hello,
Beyond logistic regression you could look at many other models that support binary targets.
Maybe the broad family of decision trees (single decision trees, gradient boosting, forests, …) is worth looking at.
In failure prediction (manufacturing) I sometimes use techniques for Analysis of Recurrent Events Data. SAS/Stat and SAS/QC have some beautiful algorithms for that but I think it's not best suited here (unless you want to analyze recidivism / repeat offence throughout the whole lifetime. As it can happen frequently of course, to the same individual.
Good luck,
Koen
Do you have follow-up data for non-recidivists (that is do you know how long they have NOT gone back to jail?)
If so then you have censored data. So use survival analysis.
Thanks @Tom
Staying out of jail would be indicated by nothing in the returned column. They've stayed out since when they were last in.
For those returning to jail, there is a return date in that column.
@NKormanik wrote:
Thanks @Tom
Staying out of jail would be indicated by nothing in the returned column. They've stayed out since when they were last in.
For those returning to jail, there is a return date in that column.
I wasn't really talking about how you currently have the data coded, but what information you have. For example do you have death data for this people? They are no longer at risk of being sent to jail after they are dead. If they have migrated to Australia then they are very unlikely to be sent to jail in the US. etc. Do you have positive information that they are not in jail by a specific date? What if they are in jail in a system that did not report it to your database?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.