Creating a new variable

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 16
Accepted Solution

Creating a new variable

[ Edited ]

I have a very large .csv dataset that is looking at times of being arrested and a disposition date. The main outcome i am looking for is whether the defendant was arrested again prior to their disposition date. To do this I have been assigned to create a new variable 're-arrest' that equals one if the defendant was arrested prior to trial and zero if otherwise. I have organized the data by person id and arrested date but I am trouble creating the new variable since i'm looking at multiple rows. I have pasted what my code and data output is looking like currently. If anyone can help me with how to approach creating this new variable.

 

p.s. i imported the dataset by using file -> import function. So i didn't code in the data by hand 

 

 

proc sort data = uchi;
by person_id;
run;

proc print data = uchi (obs = 100);
run;

data case;
set uchi;
informat arrest_date yymmdd10. dispos_date yymmdd10.;
run;

proc print data = case (obs = 10);
format arrest_date yymmdd10.;
run;

data part1;
set case;
file uchi;
put @1 caseid @2 person_id @3 arrest_date @4 dispos_date @5 treat @@;

run;

proc print data = part1 (obs =10);
run;

 

caseid person_id arrest_date dispos_date treat
5751412012-01-042012-03-270
3997012012-07-112012-10-201
8841312013-04-042013-06-220
4021652012-03-312013-03-250
9225562012-12-092013-11-090
2651672012-02-252012-03-260
291382012-10-062013-12-291
630482013-04-062013-07-070
8227792012-01-122012-11-080
3188192013-09-252013-12-291

Accepted Solutions
Solution
2 weeks ago
Esteemed Advisor
Posts: 5,129

Re: Creating a new variable

The number of rearrests between arrest and dispos can be calculated as:

 


proc sql;
create table reArrest as
select 
    *, 
    (   select 
            count(*) - 1 as nbRearrests 
        from part1 
        where person_id = a.person_id and 
              arrest_date between a.arrest_date and a.dispos_date) as nbRearrest
from part1 as a;
quit;

PG

View solution in original post


All Replies
Super User
Posts: 21,546

Re: Creating a new variable

https://stats.idre.ucla.edu/sas/faq/how-can-i-create-an-enumeration-variable-by-groups/

 

I think this is what you're trying to do, at least partly. 

Solution
2 weeks ago
Esteemed Advisor
Posts: 5,129

Re: Creating a new variable

The number of rearrests between arrest and dispos can be calculated as:

 


proc sql;
create table reArrest as
select 
    *, 
    (   select 
            count(*) - 1 as nbRearrests 
        from part1 
        where person_id = a.person_id and 
              arrest_date between a.arrest_date and a.dispos_date) as nbRearrest
from part1 as a;
quit;

PG
Super User
Posts: 12,148

Re: Creating a new variable

Reminds me of a project where we looked at re-arrest before the trial date (which can be significantly prior to disposition date) and found people with multiple following arrests before the first trial date. We were only looking at driving while intoxicated arrests as well.

 

It is usually helpful to decide what the final result should look like for the output set before coding. Which usually follows a very clear statement of the problem. If the question is "any arrests prior to a disposition for a previous arrest" a different approach would be needed than if the question were "how many later arrests prior to the disposition for this arrest" or "How soon after this arrest was a following arrest that occurred prior to the disposition". Yes/no vs. count vs. interva, requiring slightly different overlaps.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 147 views
  • 0 likes
  • 4 in conversation