Hello
I run proc tabulate with two-way structure (2 categorical variables: Z1 ,Z2).
I want to control the order of variable Z1.
The order that I get is
(1) Unknown
(10) 400-500
(11) 500+
(2) 0
(3) 0-20
(4) 20-50
(5) 50-100
(6) 100-150
(7) 150-200
(8) 200-300
(9) 300-400
The order that I want is
(1) Unknown
(2) 0
(3) 0-20
(4) 20-50
(5) 50-100
(6) 100-150
(7) 150-200
(8) 200-300
(9) 300-400
(10) 400-500
(11) 500+
What is the way to do it please?
I have tried all options: ORDER=INTERNAL ORDER=FORMATTED ORDER=UNFORMATTED ORDER=FREQ
but nothing helped to get desired result.
proc tabulate data=Have s=[just=c] out=Want;
label month_diff='Number of Months';
class Z1 Z2 /missing order=UNFORMATTED;
var lak;
table Z1='', (Z2=''*CustomerID=''*n='' all='TOTAL');
run;
Since Z1 (and Z2?) are character variables, they will default to lexicographic order (i.e. '11' falls between '1" and '2'). Now if you were able to sort your dataset such that '1' is followed by '2' and later on by '11', then order = data would work. But you also have Z2, so you would need to order your data considering every combination of Z1 and Z2.
Order=Formatted won't work because the lexicographic ordering of the formatted values would reproduce the problem.
But @data_null__ has identified a solution: (1) make a format using the "notsorted" option, (2) call that format in the proc tabulate, but use order=data and the preloadfmt options. Very nice. I had never bothered to learn that preloadfmt modifies how order=data would work.
I got curious, so here's an example using one variable.
data have;
length Z1 $15;
do Z1='(11) some text','(2) more text','(1) other text';
output;
end;
run;
proc print data=have;
title "Original Unformatted Values, in DATASET order";
run;
proc format ;
value $x (notsorted)
'(1) other text' ='(1) other_text'
'(2) more text' ='(2) more_text'
'(11) some text' ='(11) some_text' ;
run;
proc print data=have;
title "Formatted Values, in DATASET order";
format Z1 $x. ;
run;
proc tabulate data=have order=data ;
class Z1 / preloadfmt ;
tables Z1,n;
format Z1 $x.;
title "TABULATE with PRELOADFMT and ORDER=DATA, using a NOTSORTED format";
run;
Of course, the real solution to this problem is to use internal numeric values with formats specified to show the labels you want.
Note the formatted value has an underscore replace on of the blanks, so that I could be certain I was seeing formatted values instead of original values.
Did you try " order=data " ? Or padding white blanks before it .Like : (1) Unknown (2) 0 (3) 0-20 (4) 20-50
Also Order=Data is not getting desired result.
The categories already built in and I cannot change it (I get the data set as it is).
What is the way to solve it?
Maybe via proc informat?
The idea of putting (1) and (2) and (3) and so on is unnecessary. If you have set up your categories by creating custom formats 0-20 and 20-50 and 50-100 and so on, then you can have the categories you want, and they will sort in order using ORDER=INTERNAL (except for the UNKNOWN category, unless those are really missing values)
Thanks but still as I said the data set was not created by me and I need to live with its structure and its categories names.
The question is to to order it using proc tabulate using these categories names
@Ronein wrote:
Thanks but still as I said the data set was not created by me and I need to live with its structure and its categories names.
Absolutely not, you can still remove the (1) and (2) and so on, and then apply your own custom formats. Or are you saying that the raw data for Z1 and Z2 is not included?
And for anyone else reading along, do not start your category names with (1) and (2) and so on
I am looking for solution without changing categories names, thanks
I don't think you'll find an easy and better solution without using the original numeric values. The category names are character, they will sort alphabetically. If the values were numeric, you could make them sort numerically, which seems to be what you want.
@Ronein wrote:
Thanks but still as I said the data set was not created by me and I need to live with its structure and its categories names.
The question is to to order it using proc tabulate using these categories names
If that is organization policy then that is one thing, and then the responsibility for providing structures that will allow the desired output is on them.
However, what happens in creating a report should not matter if the result is as desired and can be maintained.
Many people spend lots of time creating character variables without realizing that sort orders can be critical for later output and either make them incorrectly for the desired reporting purpose or realizing that single numeric values, such as a score range are better done with a format.
With a basic numeric format and the MISSING option for the class variable to insure the Unknown (as missing) is included in the output that order would be trivial with a format. Plus remember that when the user comes back and want to see "what happens if we break the values at 25, 175 and 350 then the only change that would be needed to the Proc tabulate code would be to use a different format. When you bodge together a character range then you must modify the data to add a new variable and change all the code to use the new variable. Where with a format the new report table could be changed by modifying a statement like:
Format score Scorefmt.;
to
Format score ScoreAlternateFmt.;
Plus the boundaries are likely going to be much easier to set in Proc Format than an ugly If/then/else block in a data step.
PS. Not a good idea to ask questions about something using DATA=FORMATTED if you do not actually provide the definition of the format.
Yes. make a format like :
proc format ; value $x '(1) other text' =' (1) other_text' '(2) more text' =' (2) more_text' '(11) some text' =' (11) some_text' ; run; ........... format var $x32.;
Alternative way is just like John King (data _null_) said.
proc format ; value $x (notsorted) '(1) other text' ='(1) other_text' '(2) more text' ='(2) more_text' '(11) some text' ='(11) some_text' ; run; ........... order=data ...... format var $x.;
You need to add the NOTSORTED option to your format. Then use the PRELOADFMT and ORDER=DATA options in PROC TABULATE. If you include some useable sample data we can make example.
Since Z1 (and Z2?) are character variables, they will default to lexicographic order (i.e. '11' falls between '1" and '2'). Now if you were able to sort your dataset such that '1' is followed by '2' and later on by '11', then order = data would work. But you also have Z2, so you would need to order your data considering every combination of Z1 and Z2.
Order=Formatted won't work because the lexicographic ordering of the formatted values would reproduce the problem.
But @data_null__ has identified a solution: (1) make a format using the "notsorted" option, (2) call that format in the proc tabulate, but use order=data and the preloadfmt options. Very nice. I had never bothered to learn that preloadfmt modifies how order=data would work.
I got curious, so here's an example using one variable.
data have;
length Z1 $15;
do Z1='(11) some text','(2) more text','(1) other text';
output;
end;
run;
proc print data=have;
title "Original Unformatted Values, in DATASET order";
run;
proc format ;
value $x (notsorted)
'(1) other text' ='(1) other_text'
'(2) more text' ='(2) more_text'
'(11) some text' ='(11) some_text' ;
run;
proc print data=have;
title "Formatted Values, in DATASET order";
format Z1 $x. ;
run;
proc tabulate data=have order=data ;
class Z1 / preloadfmt ;
tables Z1,n;
format Z1 $x.;
title "TABULATE with PRELOADFMT and ORDER=DATA, using a NOTSORTED format";
run;
Of course, the real solution to this problem is to use internal numeric values with formats specified to show the labels you want.
Note the formatted value has an underscore replace on of the blanks, so that I could be certain I was seeing formatted values instead of original values.
This becomes an easy problem if you are willing to do a little programming. The three steps:
If this would be acceptable as a solution, here is what it would look like.
Step 1:
data want;
set have;
if z1="(1) Unknown" then newvar=1;
else if z1="(2) 0" then newvar=2;
else if z1="(3) 0-20" then newvar=3;
else if z1="(4) 20-50" then newvar=4;
else if z1="(5) 50-100" then newvar=5;
else if z1="(6) 100-150" then newvar=6;
else if z1="(7) 150-200" then newvar=7;
else if z1=="(8) 200-300" then newvar=8;
else if z1="(9) 300-400" then newvar=9;
else if z1="(10) 400-500" then newvar=10;
else if z1="(11) 500+" then newvar=11;
run;
Step 2:
proc format;
value relabel 1="(1) Unknown"
2="(2) 0"
3="(3) 0-20"
4="(4) 20-50"
5="(5) 50-100"
6="(6) 100-150"
7="(7) 150-200"
8="(8) 200-300"
9="(9) 300-400"
10="(10) 400-500"
11="(11) 500+"
;
run;
Step 3:
proc tabulate data=want s=[just=c] out=Want_final;
label month_diff='Number of Months';
class newvar Z2 /missing order=internal;
var lak;
table newvar='', (Z2=''*CustomerID=''*n='' all='TOTAL');
format newvar relabel.;
run;
It's not clear why you want an output data set from PROC TABULATE, but it was in your original program so I left it in the program.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.