BookmarkSubscribeRSS Feed
dstuder
Obsidian | Level 7


Hello everybody

 

The longer I thik about my data, the more problems appear which I havent thought about before: I am dealing with data of crime suspects like the following sample data:

DATA suspects;
INPUT 	personId :4. 
		article:$3. 
		title:$20. 
		age:3. 
		sex :$1.
		residenceStatus :$1.
  		dateOfCrime : yymmdd10.;
FORMAT dateOfCrime yymmdd10.;
INFILE DATALINES DSD;
DATALINES;
10,110,Property,18,m,A,2019-01-01
10,57,Violence,18,m,A,2019-01-02
10,111,Property,19,m,B,2019-02-03
10,110,Property,19,m,A,2019-01-01
19,115,Property,18,m,A,2019-12-10
13,114,Property,19,m,A,2019-04-09
14,53,Violence,24,m,E,2019-06-06
15,50,Violence,21,w,A,2019-10-08
11,38,State Security,42,w,B,2019-10-01
17,10,Forgery,37,m,B,2019-02-19
99,112,Property,41,m,A,2019-02-23
98,113,Property,55,m,A,2019-07-11
;
RUN;

PROC FORMAT;
	VALUE agegrp (NOTSORTED)
	1-20 = '<=20'
	21-HIGH = '>20';
RUN;

 Now, my goal is to create a crosstable as below. Unfortunately, this turned out to be very challenging. Here's my starting point:

PROC TABULATE; 
CLASS age article title;
CLASS sex residenceStatus / PRELOADFMT ORDER=DATA;
TABLE ((title="") * (article="" ALL="Total title") ) ALL,
(ALL="Total residence status" residenceStatus="") * (ALL="Total age" age="") * (ALL="Total sex" sex="") / PRINTMISS MISSTEXT="0";
FORMAT age agegrp.;
RUN;

However this table has some major problems:

  • First problem: In the rows not all the articles should appear in every title of the criminal law: e.g. property offences are only the articles 110, 112, 113 and 115, violent crimes are only the articles 50, 53, 57 etc. (as they appear in the data)
  • Second problem (totals and subtotals):
    - The table shows the number of suspects (persons). Therefore, the total for each title of the criminal law cannot simply be the sum of all the articles.
    If one suspect committed several crimes within the same title (e.g. if he commited two different property crimes), he/she shall only be counted once (only the first crime within that title should be counted).
    - The total number of crimes for the entire criminal law is not the sum of all titles, because if one suspect committed several crimes in different titles only he should only be counted as an offender of his first crime.

I cannot expect a complete solution but I'd already appreciate some explanations on how to approach it.

9 REPLIES 9
ed_sas_member
Meteorite | Level 14

Hi @dstuder 

 

For the first problem, just remove the PRINTMISS option in the TABLE statement.

For the second one, you need to create a preliminary data step -> in the following code, I have kept only one record per personID, corresponding to the earlier date of crime. Is that correct?

 

NB: the PRELOADFMT option in the following statement as no effect as no format has been defined:   CLASS sex residenceStatus / ORDER=DATA PRELOADFMT ; 

Best,

 

DATA suspects;
INPUT 	personId :4. 
		article:$3. 
		title:$20. 
		age:3. 
		sex :$1.
		residenceStatus :$1.
  		dateOfCrime : yymmdd10.;
FORMAT dateOfCrime yymmdd10.;
INFILE DATALINES DSD;
DATALINES;
10,110,Property,18,m,A,2019-01-01
10,57,Violence,18,m,A,2019-01-01
10,111,Property,19,m,B,2019-02-03
10,110,Property,19,m,A,2019-01-01
19,115,Property,18,m,A,2019-12-10
13,114,Property,19,m,A,2019-04-09
14,53,Violence,24,m,E,2019-06-06
15,50,Violence,21,w,A,2019-10-08
11,38,State Security,42,w,B,2019-10-01
17,10,Forgery,37,m,B,2019-02-19
99,112,Property,41,m,A,2019-02-23
98,113,Property,55,m,A,2019-07-11
;
RUN;

/**** Formats */

PROC FORMAT;
	VALUE agegrp (NOTSORTED)
	1-20 = '<=20'
	21-HIGH = '>20';
RUN;

/**** If one suspect committed several crimes
	  within the same title he/she shall only
	  be counted once + if one suspect committed
	  several crimes in different titles only
	  he should only be counted as an offender
	  of his first crime. */
	  
PROC SORT DATA=suspects out=suspect_sorted;
	BY personId dateOfCrime title;
RUN;

DATA suspect_sorted2;
	SET suspect_sorted;
	BY personId dateOfCrime title;
	IF first.personId THEN OUTPUT;
RUN;

/**** Report */

PROC TABULATE DATA=suspect_sorted2; 
	CLASS age article title; 
	CLASS sex residenceStatus / ORDER=DATA ; 
	TABLE ((title="") * (article="" ALL="Total title") ) ALL, 
       (ALL="Total residence status" residenceStatus="") * (ALL="Total age" age="") * (ALL="Total sex" sex="") / MISSTEXT="0"; 
	FORMAT age agegrp.; 
RUN;

 

ed_sas_member
Meteorite | Level 14

Just a quick question: for PersonID = 10, which 'title' would you select as 3 crimes are registered for the earliest dateofcrime? Property or Violence?

Capture d’écran 2020-05-14 à 17.27.01.png

 

Best,

dstuder
Obsidian | Level 7
Sorry about that, I've fixed the dates in my original post
ed_sas_member
Meteorite | Level 14

No worries 😊

So does the provided code meet your expectations?

 

dstuder
Obsidian | Level 7

First problem: I can't remove the printmiss-statement because I still need it for the columns (since I want all the agegroups to appear in every residence status). Is there a way to define it for specific variables instead for the whole table-statement?

 

Second problem: Thanks very much for your code! I think the problem is more complicated. I cannot simply remove all the (not first) offences from the dataset that were committed by the same person. If a person has commited two different property offences it should be counted twice (once at article xy and once at article yz). Whereas on the "sub-total" level (title) it should only be counted once. And on the total level (entire criminal law) the same problem. The three levels (articles, titles and criminal law) are actually based on three different sets of data and I dont know how to put these together in one crosstable.

Reeza
Super User
Unfortunately all those different customized calculations means that you'll likely need to summarize your data first and then use PROC REPORT or PRINT to display it.
dstuder
Obsidian | Level 7
Thanks for you answer! So you mean I have to save three different PROC TABULATES?

PROC TABULATE OUT = articles
PROC TABULATE OUT = titles
PROC TABULATE OUT = total

Can you please give an example on how to "put together" these three datasets afterwards using PRINT or PROC REPORT (I have never used them before) or could you recommend me a tutorial on this topic?
Reeza
Super User

https://support.sas.com/resources/papers/proceedings/pdfs/sgf2008/173-2008.pdf

And the relevant code for this paper:

https://support.sas.com/rnd/papers/sgf2008/complex_reports.zip

 

You do not need to use PROC TABULATE since you'll be doing the calculations more manually. I'd probably recommend going to PROC MEANS and FREQ as their output is a little bit easier to manage to get to the reporting you will need. 

 

The paper above explains the various approaches that can be used to create your reports and the code is included as well. It's older, so there are some simpler ways to get things today but the premise is still valid. 

ballardw
Super User

How about providing what you expect that actual table to look like?

Your example data is small enough that you should be able to count things by hand.

Enter the values into manually created table in a word processor if you must. Copy and paste that (hopefully)

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 963 views
  • 1 like
  • 4 in conversation