Hello everybody
The longer I thik about my data, the more problems appear which I havent thought about before: I am dealing with data of crime suspects like the following sample data:
DATA suspects; INPUT personId :4. article:$3. title:$20. age:3. sex :$1. residenceStatus :$1. dateOfCrime : yymmdd10.; FORMAT dateOfCrime yymmdd10.; INFILE DATALINES DSD; DATALINES; 10,110,Property,18,m,A,2019-01-01 10,57,Violence,18,m,A,2019-01-02 10,111,Property,19,m,B,2019-02-03 10,110,Property,19,m,A,2019-01-01 19,115,Property,18,m,A,2019-12-10 13,114,Property,19,m,A,2019-04-09 14,53,Violence,24,m,E,2019-06-06 15,50,Violence,21,w,A,2019-10-08 11,38,State Security,42,w,B,2019-10-01 17,10,Forgery,37,m,B,2019-02-19 99,112,Property,41,m,A,2019-02-23 98,113,Property,55,m,A,2019-07-11 ; RUN; PROC FORMAT; VALUE agegrp (NOTSORTED) 1-20 = '<=20' 21-HIGH = '>20'; RUN;
Now, my goal is to create a crosstable as below. Unfortunately, this turned out to be very challenging. Here's my starting point:
PROC TABULATE;
CLASS age article title;
CLASS sex residenceStatus / PRELOADFMT ORDER=DATA;
TABLE ((title="") * (article="" ALL="Total title") ) ALL,
(ALL="Total residence status" residenceStatus="") * (ALL="Total age" age="") * (ALL="Total sex" sex="") / PRINTMISS MISSTEXT="0";
FORMAT age agegrp.;
RUN;
However this table has some major problems:
I cannot expect a complete solution but I'd already appreciate some explanations on how to approach it.
Hi @dstuder
For the first problem, just remove the PRINTMISS option in the TABLE statement.
For the second one, you need to create a preliminary data step -> in the following code, I have kept only one record per personID, corresponding to the earlier date of crime. Is that correct?
NB: the PRELOADFMT option in the following statement as no effect as no format has been defined: CLASS sex residenceStatus / ORDER=DATA PRELOADFMT ;
Best,
DATA suspects;
INPUT personId :4.
article:$3.
title:$20.
age:3.
sex :$1.
residenceStatus :$1.
dateOfCrime : yymmdd10.;
FORMAT dateOfCrime yymmdd10.;
INFILE DATALINES DSD;
DATALINES;
10,110,Property,18,m,A,2019-01-01
10,57,Violence,18,m,A,2019-01-01
10,111,Property,19,m,B,2019-02-03
10,110,Property,19,m,A,2019-01-01
19,115,Property,18,m,A,2019-12-10
13,114,Property,19,m,A,2019-04-09
14,53,Violence,24,m,E,2019-06-06
15,50,Violence,21,w,A,2019-10-08
11,38,State Security,42,w,B,2019-10-01
17,10,Forgery,37,m,B,2019-02-19
99,112,Property,41,m,A,2019-02-23
98,113,Property,55,m,A,2019-07-11
;
RUN;
/**** Formats */
PROC FORMAT;
VALUE agegrp (NOTSORTED)
1-20 = '<=20'
21-HIGH = '>20';
RUN;
/**** If one suspect committed several crimes
within the same title he/she shall only
be counted once + if one suspect committed
several crimes in different titles only
he should only be counted as an offender
of his first crime. */
PROC SORT DATA=suspects out=suspect_sorted;
BY personId dateOfCrime title;
RUN;
DATA suspect_sorted2;
SET suspect_sorted;
BY personId dateOfCrime title;
IF first.personId THEN OUTPUT;
RUN;
/**** Report */
PROC TABULATE DATA=suspect_sorted2;
CLASS age article title;
CLASS sex residenceStatus / ORDER=DATA ;
TABLE ((title="") * (article="" ALL="Total title") ) ALL,
(ALL="Total residence status" residenceStatus="") * (ALL="Total age" age="") * (ALL="Total sex" sex="") / MISSTEXT="0";
FORMAT age agegrp.;
RUN;
Just a quick question: for PersonID = 10, which 'title' would you select as 3 crimes are registered for the earliest dateofcrime? Property or Violence?
Best,
No worries 😊
So does the provided code meet your expectations?
First problem: I can't remove the printmiss-statement because I still need it for the columns (since I want all the agegroups to appear in every residence status). Is there a way to define it for specific variables instead for the whole table-statement?
Second problem: Thanks very much for your code! I think the problem is more complicated. I cannot simply remove all the (not first) offences from the dataset that were committed by the same person. If a person has commited two different property offences it should be counted twice (once at article xy and once at article yz). Whereas on the "sub-total" level (title) it should only be counted once. And on the total level (entire criminal law) the same problem. The three levels (articles, titles and criminal law) are actually based on three different sets of data and I dont know how to put these together in one crosstable.
https://support.sas.com/resources/papers/proceedings/pdfs/sgf2008/173-2008.pdf
And the relevant code for this paper:
https://support.sas.com/rnd/papers/sgf2008/complex_reports.zip
You do not need to use PROC TABULATE since you'll be doing the calculations more manually. I'd probably recommend going to PROC MEANS and FREQ as their output is a little bit easier to manage to get to the reporting you will need.
The paper above explains the various approaches that can be used to create your reports and the code is included as well. It's older, so there are some simpler ways to get things today but the premise is still valid.
How about providing what you expect that actual table to look like?
Your example data is small enough that you should be able to count things by hand.
Enter the values into manually created table in a word processor if you must. Copy and paste that (hopefully)
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.