I have a dataset that contains policy violations by students. The type of violation is coded from 1 to 40. Some students have only one violation while others have up to 13 violations.
I am trying to get a frequency count across variables for the different types of violations that have occurred.
Dataset Have
id | violation1 | violation2 | violation3 | violation4 | violation5 |
A1 | 1 | 2 | 6 | ||
A2 | 5 | 3 | 1 | 1 | 3 |
A3 | 7 | 1 | 4 | 1 | 5 |
A4 | 2 | 8 | 1 | 2 | |
A5 | 1 | 7 | 2 | 8 | 6 |
Output Want
Violation_type | frequency |
1 | 7 |
2 | 4 |
3 | 2 |
4 | 1 |
5 | 2 |
6 | 2 |
7 | 2 |
8 | 2 |
Thanks in advance!
Hi,
Well, several ways of doing this. You could for instance have an array of vio1-vio7 and set them to 1 where they exist and freq that. My preferred method would be to normalise (vertical) the data, then do a simple count on that:
data have;
id="A1"; violation1=1; violation2=2; violation3=6; output;
id="A2"; violation1=5; violation2=3; violation3=1; output;
id="A3"; violation1=7; violation2=1; violation3=4; output;
run;
proc transpose data=have out=inter;
by id;
var violation1-violation3;
run;
proc sql;
create table WANT as
select COL1 as VIOLATION_TYPE,
COUNT(COL1) as FREQUENCY
from WORK.INTER
group by COL1;
quit;
Hi,
Well, several ways of doing this. You could for instance have an array of vio1-vio7 and set them to 1 where they exist and freq that. My preferred method would be to normalise (vertical) the data, then do a simple count on that:
data have;
id="A1"; violation1=1; violation2=2; violation3=6; output;
id="A2"; violation1=5; violation2=3; violation3=1; output;
id="A3"; violation1=7; violation2=1; violation3=4; output;
run;
proc transpose data=have out=inter;
by id;
var violation1-violation3;
run;
proc sql;
create table WANT as
select COL1 as VIOLATION_TYPE,
COUNT(COL1) as FREQUENCY
from WORK.INTER
group by COL1;
quit;
Thank you RW9, Kurt, and Hai!
A different approach, verticalizing the data in a data step and using freq for the count:
data have;
infile cards;
input
id :$2.
violation1
violation2
violation3
violation4
violation5
;
cards;
A1 1 2 6 . .
A2 5 3 1 1 3
A3 7 1 4 1 5
A4 2 8 1 2 .
A5 1 7 2 8 6
;
run;
data int (keep=violation_type);
set have;
array viols {*} _numeric_;
do i = 1 to dim(viols);
if viols{i} ne .
then do;
violation_type = viols{i};
output;
end;
end;
run;
proc freq data=int noprint;
tables violation_type /out=want(drop=percent);
run;
As for a one-step approach, Hash can be an option: (Raw data stole from Kurt)
data have;
infile cards;
input
id :$2.
violation1
violation2
violation3
violation4
violation5
;
cards;
A1 1 2 6 . .
A2 5 3 1 1 3
A3 7 1 4 1 5
A4 2 8 1 2 .
A5 1 7 2 8 6
;
run;
data _null_;
if _n_=1 then
do;
dcl hash h(ordered:'a');
h.definekey('vtype');
h.definedata('vtype', 'freq');
h.definedone();
call missing (vtype, freq);
end;
set have end=last;
array v violation:;
do over v;
if not missing(v) then
do;
vtype=v;
if h.find() ne 0 then
do;
freq=1;
rc=h.replace();
end;
else
do;
freq+1;
rc=h.replace();
end;
end;
end;
if last then
rc=h.output(dataset:'want');
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.