Help using Base SAS procedures

Summarizing data

Accepted Solution Solved
Reply
Contributor
Posts: 28
Accepted Solution

Summarizing data

Hello everyone-

I have a dataset for students and I would like to summarize the data by race ethnicity and by offense.  I had attempted to produce a two way table using Proc Freq but the results did not happen as I expected.   

The first table shows how the data is currently stored.  A row contain information on student and there could be multiple students in the dataset with multiple offense codes.  I would like the results to summarize all student records by race and by offense.  I understand that the counts will be duplicated.  Is there a way to produce this type of table? 

studentraceoffense
12595WHITEW1
1857ASIAND1
11589BLACKV1
54812WHITEW1
636884HISPANICD1
115887HISPANICD1
12595WHITEW1
11589BLACKV1

W1D1V1
WHITE300
ASIAN010
BLACK002
HISPANIC020

Accepted Solutions
Solution
‎01-14-2013 11:07 PM
PROC Star
Posts: 7,363

Re: Summarizing data

sas_new: I haven't kept up with this thread, thus may be missing something, but it sure sounds like you are looking for something like:

data have;

  input (school student race offense) ($);

  cards;

A 12595 WHITE W1

A 1857 ASIAN D1

A 11589 BLACK V1

A 54812 WHITE W1

A 636884 HISPANIC D1

A 115887 HISPANIC D1

A 12595 WHITE W1

A 11589 BLACK V1

B 12595 WHITE W1

B 1857 ASIAN D1

B 11589 BLACK V1

B 54812 WHITE W1

B 636884 HISPANIC D1

B 115887 HISPANIC D1

B 12595 WHITE W1

B 11589 BLACK V1

;

ods output crosstabfreqs =  xtab;

Proc Freq data=have;

  tables school*race*offense/

         out=temp missprint;

run;

proc sort data=xtab;

  by school race offense;

run;

proc transpose data=xtab (where=

        (not(missing(offense) or missing(race))))

         out=want (drop=_Smiley Happy;

  var frequency;

  by school race;

  id offense;

run;

View solution in original post


All Replies
Super User
Posts: 17,819

Re: Summarizing data

What was your code?

Proc Freq data=have;

table race*offense;

run;

Contributor
Posts: 28

Re: Summarizing data

That was my code-

Proc Freq data=have;

     tables race*offense/list missing;run;

I need the offense codes to be columns and summarized by race.  That is where I am having the difficulty.

Super User
Posts: 17,819

Re: Summarizing data

The LIST options tells it to be rows not columns. Removing it would give you what you wanted.

Or are you looking for a dataset in that format.

Contributor
Posts: 28

Re: Summarizing data

I also tried removing the 'list' option.  The dataset is huge and I am not able to get an output. If possible, I would like the code to produce a separate table with the summarized data so that I can merge with some other tables.

ps. I have also listed school as one of the variable so its a 3 way frequency-- like this:

Proc Freq data=have;

     tables school*race*offense; run;

Super User
Posts: 17,819

Re: Summarizing data

How big is your dataset?

Contributor
Posts: 28

Re: Summarizing data

I have over a million

Super User
Posts: 17,819

Re: Summarizing data

SAS_new wrote:

I also tried removing the 'list' option.  The dataset is huge and I am not able to get an output.

The general procedure is run a proc freq, save the output and proc transpose it.

But if you can't get an output that's an issue. A million rows shouldn't be much of an issue really.

You could try sorting it first, by school, race and offense.

You could also show the error you're getting, ie. why you say no output.

Super User
Posts: 10,497

Re: Summarizing data

SAS_new wrote:

I also tried removing the 'list' option.  The dataset is huge and I am not able to get an output. If possible, I would like the code to produce a separate table with the summarized data so that I can merge with some other tables.

ps. I have also listed school as one of the variable so its a 3 way frequency-- like this:

Proc Freq data=have;

     tables school*race*offense; run;

You may be looking for something more like;

proc freq data=have noprint;

table school*race*offense/ out= want ; /* you may want to play with the order of the variables depending on what you want*/

run;

Contributor
Posts: 28

Re: Summarizing data

Yes Reeza, I think I am looking for a dataset in that format.  How would I go about doing it?

Contributor
Posts: 28

Re: Summarizing data

I have used the noprint option and also created an output datafile:

Proc freq data=have;

     tables schoolcode*race*offense/out=separate;run;

The problem is that the output is not coming out the way I am expecting. I would like one row per race and one column per offense so it wold look like this:

W1D1V1
WHITE300
ASIAN010
BLACK002
HISPANIC020

It currently looks somewhat like this:

WHITED10
WHITE W13
WHITEV10
ASIANWI0
ASIAND11
ASIANV10
BLACKW10
BLACKD10
BLACKV12
HISPANICW10
HISPANIC D12
HISPANIC V10
Super User
Posts: 17,819

Re: Summarizing data

Use proc transpose now. 

Contributor
Posts: 28

Re: Summarizing data

Proc transpose does not work.  The data table has multiple rows of the same race by different offenses.

code::

PROC TRANSPOSE DATA=TST OUT=TEST;

BY SCHOOLCODE ;

ID RACE;

VAR FCODE; RUN;

SAS error message log shows this:

ERROR: The ID value "WHITE" occurs twice in the same BY group.

NOTE: The above message was for the following BY group:

SchoolCode=0100040

ERROR: The ID value "HISPANIC" occurs twice in the same BY group.

ERROR: The ID value "HISPANIC" occurs twice in the same BY group.

NOTE: The above message was for the following BY group:

SchoolCode=0100255

ERROR: The ID value "HISPANIC" occurs twice in the same BY group.

ERROR: The ID value "HISPANIC" occurs twice in the same BY group.

ERROR: The ID value "HISPANIC" occurs twice in the same BY group.

ERROR: The ID value "HISPANIC" occurs twice in the same BY group.

Super User
Posts: 17,819

Re: Summarizing data

The table you initially posted doesn't have the school information, so would you like a different table for every school? or another variable in the data indicating school.

My guess is the second, so do by school race in the proc transpose instead of just by school.

Contributor
Posts: 28

Re: Summarizing data

It does not appear to be summarizing?  I need the count of kids who fall into each cell.   Would I need to create a new variable or use the frequency count as the variable and somehow use that into the Proc Transpose code?  How would I do it though?

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 22 replies
  • 465 views
  • 4 likes
  • 4 in conversation