Transposing a category value to an indicator column

Reply
Occasional Contributor
Posts: 5

Transposing a category value to an indicator column

[ Edited ]

I'm working on a dataset where each row is a criminal offense. In some cases multiple offenses comprise one incident, so I am interested in looking at which offenses commonly occur together. There are 1.37M rows and 82 unique offenses. I want to create a column for each offense where the value will be the number of times that offense occured in each incident. Attached is a mock-up of what I would like to accomplish. I've been working on this project in Base and Enterprise Guide and I have access to any other SAS software that could help. I've considered multiple ways, such as running a macro in PROC SQL or somehow calling the column name (i.e. offense name) and comparing it the value of the offense. Any help is appreciated. Thanks in advance. 

Attachment
Super User
Posts: 1,225

Re: Transposing a category value to an indicator column

Suppose you have few OFFENSE types and

assuming file is sorted BY ID_num Incident_ID Incident_seq date

then your code may be:

 

data want;

  set have;

    by ID_num Incident_ID Incident_seq date;

         retain counters ctr1-ctr5;

         array ctrx ctr1-ctr5;

         if first.date then 

            do i=1 to 5; ctrx(i)=0; end;

     

        select (upcase(offense));

             when ('BAD CHECK') ctr1+1;

             when ('ID THEFT') ctr2+1;

             .....

             otherwise put '>>> Warning: New Offense found ' offense=;

       end;

run;

 

In case you have a lot and unknown number of OFFENSE codes then

it will be more comlicated.

 

1) proc SQL;

            create table offense_codes as

                      select distinct offense from have;

    quit;

    %let max_of = &sqlobs;

 

2) Build the  WHEN lines as a subprogram to be included:

           data _NULL_;

             set offense_codes;

             array ctrx {&max_of} ctr1-ctr&max_of. ;

             if _N_ = 1 then do;

                path = pathname('work');

                rc = filename('sub',path || 'offenes.sas');

                file sub;

             end;

             put 'WHEN ' offense  ' CTR' || left(_N_) || '+1;' ;

        run;

 

3) As before with a little cahnge:

         

data want;

  set have;

    by ID_num Incident_ID Incident_seq date;

         retain counters ctr1-ctr5;

         array ctrx ctr1-ctr5;

         if first.date then 

            do i=1 to 5; ctrx(i)=0; end;

     

        select (offense);

             %include sub;

             otherwise put '>>> Warning: New Offense found ' offense=;

       end;

run;

 

I hav not run it so please check first on small amount of observation

using options obs=20;

Then don't forget to change back   to options obs=max;

 

 

New Contributor
Posts: 4

Re: Transposing a category value to an indicator column

This works.  It will be necessary to have the select statement correspond to the offense list generated in the proc freq .

 

  /* 0: Input data */

data one ;

  length ID_Num Incident_ID $2 Incident_Seq $1 Date 3 Offense $35 ;

  drop v ;

  infile cards length = l ;

  input @ ;

   v = l -20 ;

  input @1 ID_Num $2. @4 Incident_ID $2. @7 Incident_Seq $1. @9 Date mmddyy11. @20 Offense $varying. v ;

cards;

56 67 1 1/1/2001 Bad check

56 67 1 1/1/2001 Bad check

56 68 2 4/5/2002 ID theft

57 71 1 5/6/2000 DUI-Misdemeanor

57 71 1 5/6/2000 Vehicular Manslaughter

57 71 1 5/6/2000 Speeding

57 79 2 10/7/2003 Parole Violation

57 79 2 10/7/2003 Parole Violation

57 80 3 11/23/2005 Assault

;

run ; /* */

 

  /* 1: Generate marginals: Incident ID * Offense */

proc freq data = one noprint ;

  tables Incident_ID * offense / out = freqs(drop=percent);

run ;

 

  /* 2: create the blank dataset */

  /* 2A - generate variable labels */

proc transpose data = freqs out=freqs_x_001(drop=_name_) ;

  var offense ;

data freqs_x_001_w1 ;

  set freqs_x_001 ;

  array col(*) _character_ ;

  do i = 1 to dim(col) ;

    var_label = vname(col(i)) ||"='" || strip(col(i)) || "'" ;

    output ;

  end ;

proc transpose data = freqs_x_001_w1 out = v_label(drop=_name_) ;

  var var_label ;

data _null_ ;

  set v_label ;

  array col(*) _character_ ;

  call symputx("v_label", catx(" ",of col(*)) ) ;

  call symputx("var_cnt", dim(col) ) ;

run ;

  /* 2B - Generate dataset */

data w_1 ;

length ID_Num Incident_ID $2 Incident_Seq $1 Date 5 COL1 - COL&var_cnt. 3 ;

label &v_label ;

if 0 then output ;

run ;

  /* 2C: Cleanup */

%symdel v_label ;

proc delete data = freqs_x_001 freqs_x_001_w1 v_label ;

run ; /* */

 

  /* 3: Populate the dataset */

/  * 3A: Generate base dataset based on Incident ID */

proc sort data = one out=one_s_001(drop=offense) nodupkey ;

  by incident_ID ;

data w_2 ;

  merge w_1 one_s_001 ;  

run ;

  /* 3B: Populate base dataset */

data two ;

  merge w_2 freqs ;

  by incident_id ;

  drop offense count i ;

  array col(&var_cnt.) ;

  if first.incident_id then do i = 1 to dim(col) ;

    col(i) =0 ;

  end ;

  select(offense) ;

    when("Bad check") COL1 = count ;

    when("ID theft") COL2 = count ;

    when("DUI-Misdemeanor") COL3 = count ;

    when("Vehicular Manslaughter") COL4 = count ;

    when("Speeding") COL5 = count ;

    when("Parole Violation") COL6 = count ;

    when("Assault") COL7 = count ;

    otherwise ;

  end ;

  if last.incident_id then output ;

  format date mmddyys10. ;

run ;

  /* 3C: Cleanup */

%symdel var_cnt ;

proc delete data = freqs one_s_001 w_1 w_2 ;

run ;

Occasional Contributor
Posts: 6

Re: Transposing a category value to an indicator column

check if this can help you.

 

proc sql;
create table tmp0 as select
date
,id_num
,Incident_ID
,Incident_Seq
,Offense
,count(*) as count
from one
group by 1,2,3,4,5
;quit;

 

 

proc transpose data=tmp0 out=tmp_final;
id offense;
by id_num Incident_ID Incident_Seq date;
var count;
run;

Ask a Question
Discussion stats
  • 3 replies
  • 299 views
  • 0 likes
  • 4 in conversation