I have a dataset that contains policy violations by students. The type of violation is coded from 1 to 40. Some students have only one violation while others have up to 13 violations.
I am trying to get a frequency count across variables for the different types of violations that have occurred.
Dataset Have
id | violation1 | violation2 | violation3 | violation4 | violation5 |
A1 | 1 | 2 | 6 | ||
A2 | 5 | 3 | 1 | 1 | 3 |
A3 | 7 | 1 | 4 | 1 | 5 |
A4 | 2 | 8 | 1 | 2 | |
A5 | 1 | 7 | 2 | 8 | 6 |
Output Want
Violation_type | frequency |
1 | 7 |
2 | 4 |
3 | 2 |
4 | 1 |
5 | 2 |
6 | 2 |
7 | 2 |
8 | 2 |
Thanks in advance!
Hi,
Well, several ways of doing this. You could for instance have an array of vio1-vio7 and set them to 1 where they exist and freq that. My preferred method would be to normalise (vertical) the data, then do a simple count on that:
data have;
id="A1"; violation1=1; violation2=2; violation3=6; output;
id="A2"; violation1=5; violation2=3; violation3=1; output;
id="A3"; violation1=7; violation2=1; violation3=4; output;
run;
proc transpose data=have out=inter;
by id;
var violation1-violation3;
run;
proc sql;
create table WANT as
select COL1 as VIOLATION_TYPE,
COUNT(COL1) as FREQUENCY
from WORK.INTER
group by COL1;
quit;
Hi,
Well, several ways of doing this. You could for instance have an array of vio1-vio7 and set them to 1 where they exist and freq that. My preferred method would be to normalise (vertical) the data, then do a simple count on that:
data have;
id="A1"; violation1=1; violation2=2; violation3=6; output;
id="A2"; violation1=5; violation2=3; violation3=1; output;
id="A3"; violation1=7; violation2=1; violation3=4; output;
run;
proc transpose data=have out=inter;
by id;
var violation1-violation3;
run;
proc sql;
create table WANT as
select COL1 as VIOLATION_TYPE,
COUNT(COL1) as FREQUENCY
from WORK.INTER
group by COL1;
quit;
Thank you RW9, Kurt, and Hai!
A different approach, verticalizing the data in a data step and using freq for the count:
data have;
infile cards;
input
id :$2.
violation1
violation2
violation3
violation4
violation5
;
cards;
A1 1 2 6 . .
A2 5 3 1 1 3
A3 7 1 4 1 5
A4 2 8 1 2 .
A5 1 7 2 8 6
;
run;
data int (keep=violation_type);
set have;
array viols {*} _numeric_;
do i = 1 to dim(viols);
if viols{i} ne .
then do;
violation_type = viols{i};
output;
end;
end;
run;
proc freq data=int noprint;
tables violation_type /out=want(drop=percent);
run;
As for a one-step approach, Hash can be an option: (Raw data stole from Kurt)
data have;
infile cards;
input
id :$2.
violation1
violation2
violation3
violation4
violation5
;
cards;
A1 1 2 6 . .
A2 5 3 1 1 3
A3 7 1 4 1 5
A4 2 8 1 2 .
A5 1 7 2 8 6
;
run;
data _null_;
if _n_=1 then
do;
dcl hash h(ordered:'a');
h.definekey('vtype');
h.definedata('vtype', 'freq');
h.definedone();
call missing (vtype, freq);
end;
set have end=last;
array v violation:;
do over v;
if not missing(v) then
do;
vtype=v;
if h.find() ne 0 then
do;
freq=1;
rc=h.replace();
end;
else
do;
freq+1;
rc=h.replace();
end;
end;
end;
if last then
rc=h.output(dataset:'want');
run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.