Hello everyone-
I have a dataset for students and I would like to summarize the data by race ethnicity and by offense. I had attempted to produce a two way table using Proc Freq but the results did not happen as I expected.
The first table shows how the data is currently stored. A row contain information on student and there could be multiple students in the dataset with multiple offense codes. I would like the results to summarize all student records by race and by offense. I understand that the counts will be duplicated. Is there a way to produce this type of table?
student | race | offense |
12595 | WHITE | W1 |
1857 | ASIAN | D1 |
11589 | BLACK | V1 |
54812 | WHITE | W1 |
636884 | HISPANIC | D1 |
115887 | HISPANIC | D1 |
12595 | WHITE | W1 |
11589 | BLACK | V1 |
W1 | D1 | V1 | |
WHITE | 3 | 0 | 0 |
ASIAN | 0 | 1 | 0 |
BLACK | 0 | 0 | 2 |
HISPANIC | 0 | 2 | 0 |
sas_new: I haven't kept up with this thread, thus may be missing something, but it sure sounds like you are looking for something like:
data have;
input (school student race offense) ($);
cards;
A 12595 WHITE W1
A 1857 ASIAN D1
A 11589 BLACK V1
A 54812 WHITE W1
A 636884 HISPANIC D1
A 115887 HISPANIC D1
A 12595 WHITE W1
A 11589 BLACK V1
B 12595 WHITE W1
B 1857 ASIAN D1
B 11589 BLACK V1
B 54812 WHITE W1
B 636884 HISPANIC D1
B 115887 HISPANIC D1
B 12595 WHITE W1
B 11589 BLACK V1
;
ods output crosstabfreqs = xtab;
Proc Freq data=have;
tables school*race*offense/
out=temp missprint;
run;
proc sort data=xtab;
by school race offense;
run;
proc transpose data=xtab (where=
(not(missing(offense) or missing(race))))
out=want (drop=_:);
var frequency;
by school race;
id offense;
run;
What was your code?
Proc Freq data=have;
table race*offense;
run;
That was my code-
Proc Freq data=have;
tables race*offense/list missing;run;
I need the offense codes to be columns and summarized by race. That is where I am having the difficulty.
The LIST options tells it to be rows not columns. Removing it would give you what you wanted.
Or are you looking for a dataset in that format.
I also tried removing the 'list' option. The dataset is huge and I am not able to get an output. If possible, I would like the code to produce a separate table with the summarized data so that I can merge with some other tables.
ps. I have also listed school as one of the variable so its a 3 way frequency-- like this:
Proc Freq data=have;
tables school*race*offense; run;
How big is your dataset?
I have over a million
SAS_new wrote:
I also tried removing the 'list' option. The dataset is huge and I am not able to get an output.
The general procedure is run a proc freq, save the output and proc transpose it.
But if you can't get an output that's an issue. A million rows shouldn't be much of an issue really.
You could try sorting it first, by school, race and offense.
You could also show the error you're getting, ie. why you say no output.
SAS_new wrote:
I also tried removing the 'list' option. The dataset is huge and I am not able to get an output. If possible, I would like the code to produce a separate table with the summarized data so that I can merge with some other tables.
ps. I have also listed school as one of the variable so its a 3 way frequency-- like this:
Proc Freq data=have;
tables school*race*offense; run;
You may be looking for something more like;
proc freq data=have noprint;
table school*race*offense/ out= want ; /* you may want to play with the order of the variables depending on what you want*/
run;
Yes Reeza, I think I am looking for a dataset in that format. How would I go about doing it?
I have used the noprint option and also created an output datafile:
Proc freq data=have;
tables schoolcode*race*offense/out=separate;run;
The problem is that the output is not coming out the way I am expecting. I would like one row per race and one column per offense so it wold look like this:
W1 | D1 | V1 | |
WHITE | 3 | 0 | 0 |
ASIAN | 0 | 1 | 0 |
BLACK | 0 | 0 | 2 |
HISPANIC | 0 | 2 | 0 |
It currently looks somewhat like this:
WHITE | D1 | 0 |
WHITE | W1 | 3 |
WHITE | V1 | 0 |
ASIAN | WI | 0 |
ASIAN | D1 | 1 |
ASIAN | V1 | 0 |
BLACK | W1 | 0 |
BLACK | D1 | 0 |
BLACK | V1 | 2 |
HISPANIC | W1 | 0 |
HISPANIC | D1 | 2 |
HISPANIC | V1 | 0 |
Use proc transpose now.
Proc transpose does not work. The data table has multiple rows of the same race by different offenses.
code::
PROC TRANSPOSE DATA=TST OUT=TEST;
BY SCHOOLCODE ;
ID RACE;
VAR FCODE; RUN;
SAS error message log shows this:
ERROR: The ID value "WHITE" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
SchoolCode=0100040
ERROR: The ID value "HISPANIC" occurs twice in the same BY group.
ERROR: The ID value "HISPANIC" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
SchoolCode=0100255
ERROR: The ID value "HISPANIC" occurs twice in the same BY group.
ERROR: The ID value "HISPANIC" occurs twice in the same BY group.
ERROR: The ID value "HISPANIC" occurs twice in the same BY group.
ERROR: The ID value "HISPANIC" occurs twice in the same BY group.
The table you initially posted doesn't have the school information, so would you like a different table for every school? or another variable in the data indicating school.
My guess is the second, so do by school race in the proc transpose instead of just by school.
It does not appear to be summarizing? I need the count of kids who fall into each cell. Would I need to create a new variable or use the frequency count as the variable and somehow use that into the Proc Transpose code? How would I do it though?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.