Help using Base SAS procedures

frequency count across variables

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 17
Accepted Solution

frequency count across variables

I have a dataset that contains policy violations by students. The type of violation is coded from 1 to 40. Some students have only one violation while others have up to 13 violations.

I am trying to get a frequency count across variables for the different types of violations that have occurred.

Dataset Have

id

violation1

violation2

violation3

violation4

violation5

A1

1

2

6

A2

5

3

1

1

3

A3

7

1

4

1

5

A4

2

8

1

2

A5

1

7

2

8

6

Output Want

Violation_type

frequency

1

7

2

4

3

2

4

1

5

2

6

2

7

2

8

2

Thanks in advance!


Accepted Solutions
Solution
‎02-26-2015 05:11 AM
Super User
Super User
Posts: 7,407

Re: frequency count across variables

Hi,

Well, several ways of doing this.  You could for instance have an array of vio1-vio7 and set them to 1 where they exist and freq that.  My preferred method would be to normalise (vertical) the data, then do a simple count on that:

data have;

  id="A1"; violation1=1; violation2=2; violation3=6; output;

  id="A2"; violation1=5; violation2=3; violation3=1; output;

  id="A3"; violation1=7; violation2=1; violation3=4; output;

run;

proc transpose data=have out=inter;

  by id;

  var violation1-violation3;

run;

proc sql;

  create table WANT as

  select  COL1 as VIOLATION_TYPE,

          COUNT(COL1) as FREQUENCY

  from    WORK.INTER

  group by COL1;

quit;

View solution in original post


All Replies
Solution
‎02-26-2015 05:11 AM
Super User
Super User
Posts: 7,407

Re: frequency count across variables

Hi,

Well, several ways of doing this.  You could for instance have an array of vio1-vio7 and set them to 1 where they exist and freq that.  My preferred method would be to normalise (vertical) the data, then do a simple count on that:

data have;

  id="A1"; violation1=1; violation2=2; violation3=6; output;

  id="A2"; violation1=5; violation2=3; violation3=1; output;

  id="A3"; violation1=7; violation2=1; violation3=4; output;

run;

proc transpose data=have out=inter;

  by id;

  var violation1-violation3;

run;

proc sql;

  create table WANT as

  select  COL1 as VIOLATION_TYPE,

          COUNT(COL1) as FREQUENCY

  from    WORK.INTER

  group by COL1;

quit;

Occasional Contributor
Posts: 17

Re: frequency count across variables

Thank you RW9, Kurt, and Hai!

Super User
Posts: 6,947

Re: frequency count across variables

A different approach, verticalizing the data in a data step and using freq for the count:

data have;

infile cards;

input

  id :$2.

  violation1

  violation2

  violation3

  violation4

  violation5

;

cards;

A1 1 2 6 . .

A2 5 3 1 1 3

A3 7 1 4 1 5

A4 2 8 1 2 .

A5 1 7 2 8 6

;

run;

data int (keep=violation_type);

set have;

array viols {*} _numeric_;

do i = 1 to dim(viols);

  if viols{i} ne .

  then do;

    violation_type = viols{i};

    output;

  end;

end;

run;

proc freq data=int noprint;

tables violation_type /out=want(drop=percent);

run;

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Respected Advisor
Posts: 3,124

Re: frequency count across variables

As for a one-step approach, Hash can be an option: (Raw data stole from Kurt)

data have;

     infile cards;

     input

           id :$2.

           violation1

           violation2

           violation3

           violation4

           violation5

     ;

     cards;

A1 1 2 6 . .

A2 5 3 1 1 3

A3 7 1 4 1 5

A4 2 8 1 2 .

A5 1 7 2 8 6

;

run;

data _null_;

     if _n_=1 then

           do;

                dcl hash h(ordered:'a');

                h.definekey('vtype');

                h.definedata('vtype', 'freq');

                h.definedone();

                call missing (vtype, freq);

           end;

     set have end=last;

     array v violation:;

     do over v;

           if not missing(v) then

                do;

                     vtype=v;

                     if h.find() ne 0 then

                           do;

                                freq=1;

                                rc=h.replace();

                           end;

                     else

                           do;

                                freq+1;

                                rc=h.replace();

                           end;

                end;

     end;

     if last then

           rc=h.output(dataset:'want');

run;

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 710 views
  • 8 likes
  • 4 in conversation