Help with grouping and tabulating

Reply
Contributor
Posts: 36

Help with grouping and tabulating

I have the following sort of  data;Please help the best way to do this.

raw data;

student id               student behaviour         no of library visits

110                          good                            2

111                          good                            15

113                          bad                             10

114                          bad                              1

115                          nice                             2

116                          nice                             24

117                          good                            6

119                           bad                             3

First I want to add a table with library visits banding

visit band;

student id               student behaviour         no of library visits           visit band

110                          good                            2                                  1-9

111                          good                            15                                10-20

113                          bad                             10                                  10-20

114                          bad                              1                                    1-9

115                          nice                             2                                    1-9

116                          nice                             24                                  over 20

117                          good                            6                                     1-9

119                           bad                             3                                      1-9

Lastly, a tabulation of character versus no of visits

visit tabulation:

                                               No of visits

Character                   1-9              10-20                 over 20

good                           3                 1                         0

bad                            1                   1                         0

nice                           1                    0                        1

Many thanks for your help.

Contributor
Posts: 66

Re: Help with grouping and tabulating

The visit banding category can be created as a hard coded lib_vis_cat variable in a data step:

data new; set old;

format lib_vis_cat $7.;
if lib_visits < 10 then lib_vis_cat = "1  -  9";

else if lib_visits < 21 then lib_vis_cat = "10 - 20";

else if lib_visits >= 21 then lib_vis_cat = "over 20";

run;

Then you can use a simple proc freq to tabulate.

proc freq data=new;

tables character*lib_visi_cat;

run;

Alternatively you can define a special user defined format and associate this format with the library visits number variable. However this approach has some drawback like the order of categories in the displayed table.

Contributor
Posts: 36

Re: Help with grouping and tabulating

Posted in reply to Dorota_Jarosz

Any more advice?

Super Contributor
Posts: 349

Re: Help with grouping and tabulating

Hi,

Try this.

proc format ;

value test

1-9='1-9'

10-20='10-20'

21-100='Over20'

;

run;

data have;

input studentid          studentbehaviour   $      visits;

newvisit=put(visits,test.);

cards;

110                          good                            2

111                          good                            15

113                          bad                             10

114                          bad                              1

115                          nice                             2

116                          nice                             24

117                          good                            6

119                           bad                             3

;

run;

proc tabulate data=have ;

class studentbehaviour newvisit;

table studentbehaviour,newvisit;

run;

Thanks,

Shiva

Regular Contributor
Posts: 151

Re: Help with grouping and tabulating

Here's how I would approach this.  If the final output is your goal, then you don't need the intermediate step to add the new column, you can just use formatted values of the existing column.  If the order of student behaviour is important (e.g. Good, Bad, Nice) then I've added a format which stores that order (the notsorted option in the format and the order=data option in the tabulate are the key instructions to achieve this).

data have;

input student_id student_behaviour $ library_visits;

cards;

110 good 2

111 good 15

113 bad 10

114 bad 1

115 nice 2

116 nice 24

117 good 6

119 bad 3

;

run;

proc format;

value visit_f

     low-9 = '1-9'

     10-20 = '10-20'

     20-high = 'over 20';

value $ beh_f (notsorted)

     'good' = 'Good'

     'bad' = 'Bad''

     'nice' = 'Nice';

run;

proc tabulate data=have;

class student_behaviour library_visits / order=data;

format library_visits visit_f. student_behaviour $beh_f.;

table student_behaviour='',library_visits='No. of Visits'*n='' / box='Character';

run;

Contributor
Posts: 36

Re: Help with grouping and tabulating

All thanks very much.

Doreta_jarosz your method worked perfectly(best answer i think), however when I tabulate, one of the categories i.e 10-20  does not appear chronologically, for example. I get

                     1-9  21-29  over 30  10-20
good               3

bad                 1

nice                 1         etc

I want 'over 30' to appear at the end after 10-20. I have tried sorting  data by ascending but this did not work.

Thanks.

Contributor
Posts: 66

Re: Help with grouping and tabulating

Hi zetter,

The order of a character variable depends on what you put in the character string description and it actually depends on the operating system. This is also true about the self defined format. So, if you want the displayed order to correspond to a numeric order, your first two letters should reflect the desired order. How about:

data library;
format id 3. behaviour $4. lib_visits 3. lib_vis_cat $7.;
input id  behaviour $ lib_visits;

if lib_visits < 10 then lib_vis_cat = "1-9";
else if lib_visits < 21 then lib_vis_cat = "10-20";
else if lib_visits < 30 then lib_vis_cat = "21-30";
else if lib_visits >= 21 then lib_vis_cat = "31+";
label lib_vis_cat = "Library visits count";
cards;
110   good       2
111   good      15
113   bad       10
114   bad        1
115   nice      21
116   nice      24
117   good       6
119   bad        3
120   good      31
121   good      15
123   bad       10
124   bad        1
125   nice       2
126   nice      24
127   good       6
128   bad       35
;
run;


proc freq data=library;
tables behaviour*lib_vis_cat /missing;
title "Behaviour by Library visit count";
run;

A blank precedes any character in all operating systems I worked on. In Windows OS any digit precedes letters, unlike on the mainframe Z/OS.

The reason for the format statement in the input data step is that if you don't assign the length of a character variable, the assigned length will be taken from the first data row, so if it happens to be "bad,"  then your other values will be truncated to three characters. I added the format for id to have it listed as the first variable, since the frmat statement dictates the order here. If you only put the character variables in the format statement, then the position of the variables will be different. Try it on your own.

The /missing option on the tables statement in proc freq makes sure that all categories are listed, in particular the missing values.

Since you are clearly a novice, I did not suggest proc tabulate.  It gets really tricky when there are missing values. I do not advise a SAS novice to use it before mastering basic stuff.

Proc freq is a staple of SAS and it provides a lot more information with minimal code.

The user defined format is a nice approach, but if you associate the variable with the format in a data step, rather than each time in a procedure, only the formatted values get saved. When you make a permanent data set with such formats and you later lose the format definition, you will not be able to retrieve the values. So, be careful with user defined formats.

Good luck.

Dorota

Edited to add a comment on user defined formats and the format statement.

Contributor
Posts: 36

Re: Help with grouping and tabulating

Posted in reply to Dorota_Jarosz

Thanks fr this but my table has many variables than the simple example I gave, should be complicated to do your way.Is there a sas programme to rearrange such groupings in proc tabulate or rea-arrange after ods export?

Thanks.

Contributor
Posts: 66

Re: Help with grouping and tabulating

Hi Zetter,

You can output the results of proc freq to a dataset, one table at a time.  The categorization of the library visits variable can be accomplished either by hard coding or by assigning a user defined format. The following two proc freq steps produce the same output:

proc freq data=library noprint;
tables behaviour*lib_vis_cat /out=beh_vis_cat (drop=percent);
*title "Behaviour by Library visit count - hard coded";
run;

proc format;
value lib_vis_fmt
low-9 = " 1-9"
10-20 = "10-20"
21-30 = "21-30"
31-high="31 plus";
run;

proc freq data=library noprint;
tables behaviour*lib_visits / out=beh_vis (drop=percent) ;
*title "Behaviour by Library visit count using format";
format lib_visits lib_vis_fmt.;
run;

*Both tables contain four variables: behaviour, lib_visits (or lib_vis_cat), COUNT, and PERCENT, which can be dropped using the output data set option shown. You can use this syntax for any variable in the data set, but if you include more than one cross tabultion in the tables statement, only the last one will be output.;

proc print data=beh_vis noobs;

title "Behaviour by library visit count using format";

run;

*Output:

Behaviour by library visit count using format

             lib_
behaviour    visits     COUNT

  bad         1-9         3
  bad        10-20        2
  bad        31 plus      1
  good        1-9         3
  good       10-20        2
  good       31 plus      1
  nice        1-9         1
  nice       21-30        3

;

Ask a Question
Discussion stats
  • 8 replies
  • 330 views
  • 1 like
  • 4 in conversation