Solved: Re: Summarizing data

SAS_new · Posted 01-14-2013 04:47 PM

Hello everyone-

I have a dataset for students and I would like to summarize the data by race ethnicity and by offense. I had attempted to produce a two way table using Proc Freq but the results did not happen as I expected.

The first table shows how the data is currently stored. A row contain information on student and there could be multiple students in the dataset with multiple offense codes. I would like the results to summarize all student records by race and by offense. I understand that the counts will be duplicated. Is there a way to produce this type of table?

student	race	offense
12595	WHITE	W1
1857	ASIAN	D1
11589	BLACK	V1
54812	WHITE	W1
636884	HISPANIC	D1
115887	HISPANIC	D1
12595	WHITE	W1
11589	BLACK	V1

	W1	D1	V1
WHITE	3	0	0
ASIAN	0	1	0
BLACK	0	0	2
HISPANIC	0	2	0

art297 · Posted 01-14-2013 11:07 PM

sas_new: I haven't kept up with this thread, thus may be missing something, but it sure sounds like you are looking for something like:

data have;

input (school student race offense) ($);

cards;

A 12595 WHITE W1

A 1857 ASIAN D1

A 11589 BLACK V1

A 54812 WHITE W1

A 636884 HISPANIC D1

A 115887 HISPANIC D1

A 12595 WHITE W1

A 11589 BLACK V1

B 12595 WHITE W1

B 1857 ASIAN D1

B 11589 BLACK V1

B 54812 WHITE W1

B 636884 HISPANIC D1

B 115887 HISPANIC D1

B 12595 WHITE W1

B 11589 BLACK V1

;

ods output crosstabfreqs = xtab;

Proc Freq data=have;

tables school*race*offense/

out=temp missprint;

run;

proc sort data=xtab;

by school race offense;

run;

proc transpose data=xtab (where=

(not(missing(offense) or missing(race))))

out=want (drop=_:);

var frequency;

by school race;

id offense;

run;

View solution in original post

Reeza · Posted 01-14-2013 04:59 PM

What was your code?

Proc Freq data=have;

table race*offense;

run;

SAS_new · Posted 01-14-2013 05:04 PM

That was my code-

Proc Freq data=have;

tables race*offense/list missing;run;

I need the offense codes to be columns and summarized by race. That is where I am having the difficulty.

Reeza · Posted 01-14-2013 05:07 PM

The LIST options tells it to be rows not columns. Removing it would give you what you wanted.

Or are you looking for a dataset in that format.

SAS_new · Posted 01-14-2013 05:15 PM

I also tried removing the 'list' option. The dataset is huge and I am not able to get an output. If possible, I would like the code to produce a separate table with the summarized data so that I can merge with some other tables.

ps. I have also listed school as one of the variable so its a 3 way frequency-- like this:

Proc Freq data=have;

tables school*race*offense; run;

Reeza · Posted 01-14-2013 05:33 PM

How big is your dataset?

SAS_new · Posted 01-14-2013 05:37 PM

I have over a million

Reeza · Posted 01-14-2013 05:43 PM

SAS_new wrote:

I also tried removing the 'list' option. The dataset is huge and I am not able to get an output.

The general procedure is run a proc freq, save the output and proc transpose it.

But if you can't get an output that's an issue. A million rows shouldn't be much of an issue really.

You could try sorting it first, by school, race and offense.

You could also show the error you're getting, ie. why you say no output.

ballardw · Posted 01-14-2013 05:43 PM

SAS_new wrote:

I also tried removing the 'list' option. The dataset is huge and I am not able to get an output. If possible, I would like the code to produce a separate table with the summarized data so that I can merge with some other tables.

ps. I have also listed school as one of the variable so its a 3 way frequency-- like this:

Proc Freq data=have;

tables school*race*offense; run;

You may be looking for something more like;

proc freq data=have noprint;

table school*race*offense/ out= want ; /* you may want to play with the order of the variables depending on what you want*/

run;

SAS_new · Posted 01-14-2013 05:16 PM

Yes Reeza, I think I am looking for a dataset in that format. How would I go about doing it?

SAS_new · Posted 01-14-2013 05:56 PM

I have used the noprint option and also created an output datafile:

Proc freq data=have;

tables schoolcode*race*offense/out=separate;run;

The problem is that the output is not coming out the way I am expecting. I would like one row per race and one column per offense so it wold look like this:

	W1	D1	V1
WHITE	3	0	0
ASIAN	0	1	0
BLACK	0	0	2
HISPANIC	0	2	0

It currently looks somewhat like this:

WHITE	D1	0
WHITE	W1	3
WHITE	V1	0
ASIAN	WI	0
ASIAN	D1	1
ASIAN	V1	0
BLACK	W1	0
BLACK	D1	0
BLACK	V1	2
HISPANIC	W1	0
HISPANIC	D1	2
HISPANIC	V1	0

Reeza · Posted 01-14-2013 05:57 PM

Use proc transpose now.

SAS_new · Posted 01-14-2013 06:21 PM

Proc transpose does not work. The data table has multiple rows of the same race by different offenses.

code::

PROC TRANSPOSE DATA=TST OUT=TEST;

BY SCHOOLCODE ;

ID RACE;

VAR FCODE; RUN;

SAS error message log shows this:

ERROR: The ID value "WHITE" occurs twice in the same BY group.

NOTE: The above message was for the following BY group:

SchoolCode=0100040

ERROR: The ID value "HISPANIC" occurs twice in the same BY group.

NOTE: The above message was for the following BY group:

SchoolCode=0100255

ERROR: The ID value "HISPANIC" occurs twice in the same BY group.

Reeza · Posted 01-14-2013 06:26 PM

The table you initially posted doesn't have the school information, so would you like a different table for every school? or another variable in the data indicating school.

My guess is the second, so do by school race in the proc transpose instead of just by school.

SAS_new · Posted 01-14-2013 06:39 PM

It does not appear to be summarizing? I need the count of kids who fall into each cell. Would I need to create a new variable or use the frequency count as the variable and somehow use that into the Proc Transpose code? How would I do it though?

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away