Transposing a category value to an indicator column

Reply
Occasional Contributor
Posts: 5

Transposing a category value to an indicator column

[ Edited ]

I'm working on a dataset where each row is a criminal offense. In some cases multiple offenses comprise one incident, so I am interested in looking at which offenses commonly occur together. There are 1.37M rows and 82 unique offenses. I want to create a column for each offense where the value will be the number of times that offense occured in each incident. Attached is a mock-up of what I would like to accomplish. I've been working on this project in Base and Enterprise Guide and I have access to any other SAS software that could help. I've considered multiple ways, such as running a macro in PROC SQL or somehow calling the column name (i.e. offense name) and comparing it the value of the offense. Any help is appreciated. Thanks in advance. 


Desired_table_ex_screenshot.jpg
Trusted Advisor
Posts: 1,469

Re: Transposing a category value to an indicator column

Suppose you have few OFFENSE types and

assuming file is sorted BY ID_num Incident_ID Incident_seq date

then your code may be:

 

data want;

  set have;

    by ID_num Incident_ID Incident_seq date;

         retain counters ctr1-ctr5;

         array ctrx ctr1-ctr5;

         if first.date then 

            do i=1 to 5; ctrx(i)=0; end;

     

        select (upcase(offense));

             when ('BAD CHECK') ctr1+1;

             when ('ID THEFT') ctr2+1;

             .....

             otherwise put '>>> Warning: New Offense found ' offense=;

       end;

run;

 

In case you have a lot and unknown number of OFFENSE codes then

it will be more comlicated.

 

1) proc SQL;

            create table offense_codes as

                      select distinct offense from have;

    quit;

    %let max_of = &sqlobs;

 

2) Build the  WHEN lines as a subprogram to be included:

           data _NULL_;

             set offense_codes;

             array ctrx {&max_of} ctr1-ctr&max_of. ;

             if _N_ = 1 then do;

                path = pathname('work');

                rc = filename('sub',path || 'offenes.sas');

                file sub;

             end;

             put 'WHEN ' offense  ' CTR' || left(_N_) || '+1;' ;

        run;

 

3) As before with a little cahnge:

         

data want;

  set have;

    by ID_num Incident_ID Incident_seq date;

         retain counters ctr1-ctr5;

         array ctrx ctr1-ctr5;

         if first.date then 

            do i=1 to 5; ctrx(i)=0; end;

     

        select (offense);

             %include sub;

             otherwise put '>>> Warning: New Offense found ' offense=;

       end;

run;

 

I hav not run it so please check first on small amount of observation

using options obs=20;

Then don't forget to change back   to options obs=max;

 

 

New Contributor
Posts: 4

Re: Transposing a category value to an indicator column

This works.  It will be necessary to have the select statement correspond to the offense list generated in the proc freq .

 

  /* 0: Input data */

data one ;

  length ID_Num Incident_ID $2 Incident_Seq $1 Date 3 Offense $35 ;

  drop v ;

  infile cards length = l ;

  input @ ;

   v = l -20 ;

  input @1 ID_Num $2. @4 Incident_ID $2. @7 Incident_Seq $1. @9 Date mmddyy11. @20 Offense $varying. v ;

cards;

56 67 1 1/1/2001 Bad check

56 67 1 1/1/2001 Bad check

56 68 2 4/5/2002 ID theft

57 71 1 5/6/2000 DUI-Misdemeanor

57 71 1 5/6/2000 Vehicular Manslaughter

57 71 1 5/6/2000 Speeding

57 79 2 10/7/2003 Parole Violation

57 79 2 10/7/2003 Parole Violation

57 80 3 11/23/2005 Assault

;

run ; /* */

 

  /* 1: Generate marginals: Incident ID * Offense */

proc freq data = one noprint ;

  tables Incident_ID * offense / out = freqs(drop=percent);

run ;

 

  /* 2: create the blank dataset */

  /* 2A - generate variable labels */

proc transpose data = freqs out=freqs_x_001(drop=_name_) ;

  var offense ;

data freqs_x_001_w1 ;

  set freqs_x_001 ;

  array col(*) _character_ ;

  do i = 1 to dim(col) ;

    var_label = vname(col(i)) ||"='" || strip(col(i)) || "'" ;

    output ;

  end ;

proc transpose data = freqs_x_001_w1 out = v_label(drop=_name_) ;

  var var_label ;

data _null_ ;

  set v_label ;

  array col(*) _character_ ;

  call symputx("v_label", catx(" ",of col(*)) ) ;

  call symputx("var_cnt", dim(col) ) ;

run ;

  /* 2B - Generate dataset */

data w_1 ;

length ID_Num Incident_ID $2 Incident_Seq $1 Date 5 COL1 - COL&var_cnt. 3 ;

label &v_label ;

if 0 then output ;

run ;

  /* 2C: Cleanup */

%symdel v_label ;

proc delete data = freqs_x_001 freqs_x_001_w1 v_label ;

run ; /* */

 

  /* 3: Populate the dataset */

/  * 3A: Generate base dataset based on Incident ID */

proc sort data = one out=one_s_001(drop=offense) nodupkey ;

  by incident_ID ;

data w_2 ;

  merge w_1 one_s_001 ;  

run ;

  /* 3B: Populate base dataset */

data two ;

  merge w_2 freqs ;

  by incident_id ;

  drop offense count i ;

  array col(&var_cnt.) ;

  if first.incident_id then do i = 1 to dim(col) ;

    col(i) =0 ;

  end ;

  select(offense) ;

    when("Bad check") COL1 = count ;

    when("ID theft") COL2 = count ;

    when("DUI-Misdemeanor") COL3 = count ;

    when("Vehicular Manslaughter") COL4 = count ;

    when("Speeding") COL5 = count ;

    when("Parole Violation") COL6 = count ;

    when("Assault") COL7 = count ;

    otherwise ;

  end ;

  if last.incident_id then output ;

  format date mmddyys10. ;

run ;

  /* 3C: Cleanup */

%symdel var_cnt ;

proc delete data = freqs one_s_001 w_1 w_2 ;

run ;

Occasional Contributor
Posts: 6

Re: Transposing a category value to an indicator column

check if this can help you.

 

proc sql;
create table tmp0 as select
date
,id_num
,Incident_ID
,Incident_Seq
,Offense
,count(*) as count
from one
group by 1,2,3,4,5
;quit;

 

 

proc transpose data=tmp0 out=tmp_final;
id offense;
by id_num Incident_ID Incident_Seq date;
var count;
run;

Ask a Question
Discussion stats
  • 3 replies
  • 330 views
  • 0 likes
  • 4 in conversation