I have the following sort of data;Please help the best way to do this.
raw data;
student id student behaviour no of library visits
110 good 2
111 good 15
113 bad 10
114 bad 1
115 nice 2
116 nice 24
117 good 6
119 bad 3
First I want to add a table with library visits banding
visit band;
student id student behaviour no of library visits visit band
110 good 2 1-9
111 good 15 10-20
113 bad 10 10-20
114 bad 1 1-9
115 nice 2 1-9
116 nice 24 over 20
117 good 6 1-9
119 bad 3 1-9
Lastly, a tabulation of character versus no of visits
visit tabulation:
No of visits
Character 1-9 10-20 over 20
good 3 1 0
bad 1 1 0
nice 1 0 1
Many thanks for your help.
The visit banding category can be created as a hard coded lib_vis_cat variable in a data step:
data new; set old;
format lib_vis_cat $7.;
if lib_visits < 10 then lib_vis_cat = "1 - 9";
else if lib_visits < 21 then lib_vis_cat = "10 - 20";
else if lib_visits >= 21 then lib_vis_cat = "over 20";
run;
Then you can use a simple proc freq to tabulate.
proc freq data=new;
tables character*lib_visi_cat;
run;
Alternatively you can define a special user defined format and associate this format with the library visits number variable. However this approach has some drawback like the order of categories in the displayed table.
Any more advice?
Hi,
Try this.
proc format ;
value test
1-9='1-9'
10-20='10-20'
21-100='Over20'
;
run;
data have;
input studentid studentbehaviour $ visits;
newvisit=put(visits,test.);
cards;
110 good 2
111 good 15
113 bad 10
114 bad 1
115 nice 2
116 nice 24
117 good 6
119 bad 3
;
run;
proc tabulate data=have ;
class studentbehaviour newvisit;
table studentbehaviour,newvisit;
run;
Thanks,
Shiva
Here's how I would approach this. If the final output is your goal, then you don't need the intermediate step to add the new column, you can just use formatted values of the existing column. If the order of student behaviour is important (e.g. Good, Bad, Nice) then I've added a format which stores that order (the notsorted option in the format and the order=data option in the tabulate are the key instructions to achieve this).
data have;
input student_id student_behaviour $ library_visits;
cards;
110 good 2
111 good 15
113 bad 10
114 bad 1
115 nice 2
116 nice 24
117 good 6
119 bad 3
;
run;
proc format;
value visit_f
low-9 = '1-9'
10-20 = '10-20'
20-high = 'over 20';
value $ beh_f (notsorted)
'good' = 'Good'
'bad' = 'Bad''
'nice' = 'Nice';
run;
proc tabulate data=have;
class student_behaviour library_visits / order=data;
format library_visits visit_f. student_behaviour $beh_f.;
table student_behaviour='',library_visits='No. of Visits'*n='' / box='Character';
run;
All thanks very much.
Doreta_jarosz your method worked perfectly(best answer i think), however when I tabulate, one of the categories i.e 10-20 does not appear chronologically, for example. I get
1-9 21-29 over 30 10-20
good 3
bad 1
nice 1 etc
I want 'over 30' to appear at the end after 10-20. I have tried sorting data by ascending but this did not work.
Thanks.
Hi zetter,
The order of a character variable depends on what you put in the character string description and it actually depends on the operating system. This is also true about the self defined format. So, if you want the displayed order to correspond to a numeric order, your first two letters should reflect the desired order. How about:
data library;
format id 3. behaviour $4. lib_visits 3. lib_vis_cat $7.;
input id behaviour $ lib_visits;
if lib_visits < 10 then lib_vis_cat = "1-9";
else if lib_visits < 21 then lib_vis_cat = "10-20";
else if lib_visits < 30 then lib_vis_cat = "21-30";
else if lib_visits >= 21 then lib_vis_cat = "31+";
label lib_vis_cat = "Library visits count";
cards;
110 good 2
111 good 15
113 bad 10
114 bad 1
115 nice 21
116 nice 24
117 good 6
119 bad 3
120 good 31
121 good 15
123 bad 10
124 bad 1
125 nice 2
126 nice 24
127 good 6
128 bad 35
;
run;
proc freq data=library;
tables behaviour*lib_vis_cat /missing;
title "Behaviour by Library visit count";
run;
A blank precedes any character in all operating systems I worked on. In Windows OS any digit precedes letters, unlike on the mainframe Z/OS.
The reason for the format statement in the input data step is that if you don't assign the length of a character variable, the assigned length will be taken from the first data row, so if it happens to be "bad," then your other values will be truncated to three characters. I added the format for id to have it listed as the first variable, since the frmat statement dictates the order here. If you only put the character variables in the format statement, then the position of the variables will be different. Try it on your own.
The /missing option on the tables statement in proc freq makes sure that all categories are listed, in particular the missing values.
Since you are clearly a novice, I did not suggest proc tabulate. It gets really tricky when there are missing values. I do not advise a SAS novice to use it before mastering basic stuff.
Proc freq is a staple of SAS and it provides a lot more information with minimal code.
The user defined format is a nice approach, but if you associate the variable with the format in a data step, rather than each time in a procedure, only the formatted values get saved. When you make a permanent data set with such formats and you later lose the format definition, you will not be able to retrieve the values. So, be careful with user defined formats.
Good luck.
Dorota
Edited to add a comment on user defined formats and the format statement.
Thanks fr this but my table has many variables than the simple example I gave, should be complicated to do your way.Is there a sas programme to rearrange such groupings in proc tabulate or rea-arrange after ods export?
Thanks.
Hi Zetter,
You can output the results of proc freq to a dataset, one table at a time. The categorization of the library visits variable can be accomplished either by hard coding or by assigning a user defined format. The following two proc freq steps produce the same output:
proc freq data=library noprint;
tables behaviour*lib_vis_cat /out=beh_vis_cat (drop=percent);
*title "Behaviour by Library visit count - hard coded";
run;
proc format;
value lib_vis_fmt
low-9 = " 1-9"
10-20 = "10-20"
21-30 = "21-30"
31-high="31 plus";
run;
proc freq data=library noprint;
tables behaviour*lib_visits / out=beh_vis (drop=percent) ;
*title "Behaviour by Library visit count using format";
format lib_visits lib_vis_fmt.;
run;
*Both tables contain four variables: behaviour, lib_visits (or lib_vis_cat), COUNT, and PERCENT, which can be dropped using the output data set option shown. You can use this syntax for any variable in the data set, but if you include more than one cross tabultion in the tables statement, only the last one will be output.;
proc print data=beh_vis noobs;
title "Behaviour by library visit count using format";
run;
*Output:
Behaviour by library visit count using format
lib_
behaviour visits COUNT
bad 1-9 3
bad 10-20 2
bad 31 plus 1
good 1-9 3
good 10-20 2
good 31 plus 1
nice 1-9 1
nice 21-30 3
;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.