I have a very large .csv dataset that is looking at times of being arrested and a disposition date. The main outcome i am looking for is whether the defendant was arrested again prior to their disposition date. To do this I have been assigned to create a new variable 're-arrest' that equals one if the defendant was arrested prior to trial and zero if otherwise. I have organized the data by person id and arrested date but I am trouble creating the new variable since i'm looking at multiple rows. I have pasted what my code and data output is looking like currently. If anyone can help me with how to approach creating this new variable.
p.s. i imported the dataset by using file -> import function. So i didn't code in the data by hand
proc sort data = uchi;
by person_id;
run;
proc print data = uchi (obs = 100);
run;
data case;
set uchi;
informat arrest_date yymmdd10. dispos_date yymmdd10.;
run;
proc print data = case (obs = 10);
format arrest_date yymmdd10.;
run;
data part1;
set case;
file uchi;
put @1 caseid @2 person_id @3 arrest_date @4 dispos_date @5 treat @@;
run;
proc print data = part1 (obs =10);
run;
57514 | 1 | 2012-01-04 | 2012-03-27 | 0 |
39970 | 1 | 2012-07-11 | 2012-10-20 | 1 |
88413 | 1 | 2013-04-04 | 2013-06-22 | 0 |
40216 | 5 | 2012-03-31 | 2013-03-25 | 0 |
92255 | 6 | 2012-12-09 | 2013-11-09 | 0 |
26516 | 7 | 2012-02-25 | 2012-03-26 | 0 |
2913 | 8 | 2012-10-06 | 2013-12-29 | 1 |
6304 | 8 | 2013-04-06 | 2013-07-07 | 0 |
82277 | 9 | 2012-01-12 | 2012-11-08 | 0 |
31881 | 9 | 2013-09-25 | 2013-12-29 | 1 |
The number of rearrests between arrest and dispos can be calculated as:
proc sql;
create table reArrest as
select
*,
( select
count(*) - 1 as nbRearrests
from part1
where person_id = a.person_id and
arrest_date between a.arrest_date and a.dispos_date) as nbRearrest
from part1 as a;
quit;
https://stats.idre.ucla.edu/sas/faq/how-can-i-create-an-enumeration-variable-by-groups/
I think this is what you're trying to do, at least partly.
The number of rearrests between arrest and dispos can be calculated as:
proc sql;
create table reArrest as
select
*,
( select
count(*) - 1 as nbRearrests
from part1
where person_id = a.person_id and
arrest_date between a.arrest_date and a.dispos_date) as nbRearrest
from part1 as a;
quit;
Reminds me of a project where we looked at re-arrest before the trial date (which can be significantly prior to disposition date) and found people with multiple following arrests before the first trial date. We were only looking at driving while intoxicated arrests as well.
It is usually helpful to decide what the final result should look like for the output set before coding. Which usually follows a very clear statement of the problem. If the question is "any arrests prior to a disposition for a previous arrest" a different approach would be needed than if the question were "how many later arrests prior to the disposition for this arrest" or "How soon after this arrest was a following arrest that occurred prior to the disposition". Yes/no vs. count vs. interva, requiring slightly different overlaps.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.