BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ronein
Meteorite | Level 14

Hello

I run proc tabulate with two-way structure (2 categorical variables: Z1 ,Z2).

I want to control the order of variable Z1.

The order that I get is

(1) Unknown

(10) 400-500

(11) 500+

(2) 0

(3) 0-20

(4) 20-50

(5) 50-100

(6) 100-150

(7) 150-200

(8) 200-300

(9) 300-400

 

The order that I want is 

(1) Unknown
(2) 0
(3) 0-20
(4) 20-50
(5) 50-100
(6) 100-150
(7) 150-200
(8) 200-300
(9) 300-400

(10) 400-500
(11) 500+

 

What is the way to do it please?

I have tried all options:  ORDER=INTERNAL    ORDER=FORMATTED     ORDER=UNFORMATTED ORDER=FREQ

but nothing helped to get desired result.

proc tabulate data=Have s=[just=c] out=Want;
label month_diff='Number of Months';
class Z1  Z2 /missing order=UNFORMATTED;
var lak;
table Z1='', (Z2=''*CustomerID=''*n='' all='TOTAL');
run;    

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Since Z1 (and Z2?) are character variables, they will default to lexicographic order  (i.e.  '11' falls between '1" and '2').   Now if you were able to sort your dataset such that '1' is followed by '2' and later on by '11', then order = data would work.  But you also have Z2, so you would need to order your data considering every combination of Z1 and Z2.

 

Order=Formatted won't work because the lexicographic ordering of the formatted values would reproduce the problem.

 

But @data_null__ has identified a solution:  (1) make a format using the "notsorted" option, (2) call that format in the proc tabulate, but use order=data and the preloadfmt options.  Very nice.  I had never bothered to learn that preloadfmt modifies how order=data would work.

 

I got curious, so here's an example using one variable.

 

data have;
  length Z1 $15;
  do Z1='(11) some text','(2) more text','(1) other text';
    output;
  end;
run;

proc print data=have;
  title "Original Unformatted Values, in DATASET order";
run;

proc format ;
  value $x  (notsorted)
    '(1) other text' ='(1) other_text'
    '(2) more text'  ='(2) more_text'
    '(11) some text' ='(11) some_text' ;
run;

proc print data=have;
  title "Formatted Values, in DATASET order";
  format Z1 $x. ;
run;

proc tabulate data=have order=data ;
  class Z1 / preloadfmt ;
  tables Z1,n;
  format Z1 $x.;
  title "TABULATE with PRELOADFMT and ORDER=DATA, using a NOTSORTED format";
run;

Of course, the real solution to this problem is to use internal numeric values with formats specified to show the labels you want.

 

Note the formatted value has an underscore replace on of the blanks, so that I could be certain I was seeing formatted values instead of original values.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

12 REPLIES 12
Ksharp
Super User
Did you try " order=data " ?

Or padding white blanks before it .Like :

               (1) Unknown
          (2) 0
      (3) 0-20
(4) 20-50
Ronein
Meteorite | Level 14

Also Order=Data  is not getting desired result.

The categories already built in and I cannot change it (I get the data set as it is).

What is the way to solve it?

Maybe via proc informat?

PaigeMiller
Diamond | Level 26

The idea of putting (1) and (2) and (3) and so on is unnecessary. If you have set up your categories by creating custom formats 0-20 and 20-50 and 50-100 and so on, then you can have the categories you want, and they will sort in order using ORDER=INTERNAL (except for the UNKNOWN category, unless those are really missing values)

--
Paige Miller
Ronein
Meteorite | Level 14

Thanks but still as I said the data set was not created by me and I need to live with its structure and its categories names.

The question is to to order it using proc tabulate using these categories names

 

 

PaigeMiller
Diamond | Level 26

@Ronein wrote:

Thanks but still as I said the data set was not created by me and I need to live with its structure and its categories names.


Absolutely not, you can still remove the (1) and (2) and so on, and then apply your own custom formats. Or are you saying that the raw data for Z1 and Z2 is not included?

 

And for anyone else reading along, do not start your category names with (1) and (2) and so on

--
Paige Miller
Ronein
Meteorite | Level 14

I am looking for solution without changing categories names, thanks

PaigeMiller
Diamond | Level 26

I don't think you'll find an easy and better solution without using the original numeric values. The category names are character, they will sort alphabetically. If the values were numeric, you could make them sort numerically, which seems to be what you want.

--
Paige Miller
ballardw
Super User

@Ronein wrote:

Thanks but still as I said the data set was not created by me and I need to live with its structure and its categories names.

The question is to to order it using proc tabulate using these categories names

 

 


If that is organization policy then that is one thing, and then the responsibility for providing structures that will allow the desired output is on them.

However, what happens in creating a report should not matter if the result is as desired and can be maintained.

 

Many people spend lots of time creating character variables without realizing that sort orders can be critical for later output and either make them incorrectly for the desired reporting purpose or realizing that single numeric values, such as a score range are better done with a format.

With a basic numeric format and the MISSING option for the class variable to insure the Unknown (as missing) is included in the output that order would be trivial with a format. Plus remember that when the user comes back and want to see "what happens if we break the values at 25, 175 and 350 then the only change that would be needed to the Proc tabulate code would be to use a different format. When you bodge together a character range then you must modify the data to add a new variable and change all the code to use the new variable. Where with a format the new report table could be changed by modifying a statement like:

Format score Scorefmt.;

to

Format score ScoreAlternateFmt.;

Plus the boundaries are likely going to be much easier to set in Proc Format than an ugly If/then/else block in a data step.

 

PS. Not a good idea to ask questions about something using DATA=FORMATTED if you do not actually provide the definition of the format.

Ksharp
Super User

Yes. make a format like :

 

proc format ;
  value $x    
    '(1) other text' ='       (1) other_text'
    '(2) more text'  ='    (2) more_text'
    '(11) some text' ='  (11) some_text' ;
run;

...........
format var $x32.;

Alternative way is just like John King (data _null_) said.

proc format ;
  value $x  (notsorted)
    '(1) other text' ='(1) other_text'
    '(2) more text'  ='(2) more_text'
    '(11) some text' ='(11) some_text' ;
run;

...........
order=data
......
format var $x.;
data_null__
Jade | Level 19

You need to add the NOTSORTED option to your format.  Then use the PRELOADFMT and ORDER=DATA options in PROC TABULATE.  If you include some useable sample data we can make example.

 

mkeintz
PROC Star

Since Z1 (and Z2?) are character variables, they will default to lexicographic order  (i.e.  '11' falls between '1" and '2').   Now if you were able to sort your dataset such that '1' is followed by '2' and later on by '11', then order = data would work.  But you also have Z2, so you would need to order your data considering every combination of Z1 and Z2.

 

Order=Formatted won't work because the lexicographic ordering of the formatted values would reproduce the problem.

 

But @data_null__ has identified a solution:  (1) make a format using the "notsorted" option, (2) call that format in the proc tabulate, but use order=data and the preloadfmt options.  Very nice.  I had never bothered to learn that preloadfmt modifies how order=data would work.

 

I got curious, so here's an example using one variable.

 

data have;
  length Z1 $15;
  do Z1='(11) some text','(2) more text','(1) other text';
    output;
  end;
run;

proc print data=have;
  title "Original Unformatted Values, in DATASET order";
run;

proc format ;
  value $x  (notsorted)
    '(1) other text' ='(1) other_text'
    '(2) more text'  ='(2) more_text'
    '(11) some text' ='(11) some_text' ;
run;

proc print data=have;
  title "Formatted Values, in DATASET order";
  format Z1 $x. ;
run;

proc tabulate data=have order=data ;
  class Z1 / preloadfmt ;
  tables Z1,n;
  format Z1 $x.;
  title "TABULATE with PRELOADFMT and ORDER=DATA, using a NOTSORTED format";
run;

Of course, the real solution to this problem is to use internal numeric values with formats specified to show the labels you want.

 

Note the formatted value has an underscore replace on of the blanks, so that I could be certain I was seeing formatted values instead of original values.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Astounding
PROC Star

This becomes an easy problem if you are willing to do a little programming.  The three steps:

  1. Add a new variable that indicates the proper order
  2. Format the new variable so it prints the way you want to see it.
  3. Run PROC TABULATE using your new variable.

If this would be acceptable as a solution, here is what it would look  like.

 

Step 1:

data want;
   set have;
   if z1="(1) Unknown" then newvar=1;
   else if z1="(2) 0" then newvar=2;
   else if z1="(3) 0-20" then newvar=3;
   else if z1="(4) 20-50" then newvar=4;
   else if z1="(5) 50-100" then newvar=5;
   else if z1="(6) 100-150" then newvar=6;
   else if z1="(7) 150-200" then newvar=7;
   else if z1=="(8) 200-300" then newvar=8;
   else if z1="(9) 300-400" then newvar=9;
   else if z1="(10) 400-500" then newvar=10;
   else if z1="(11) 500+" then newvar=11;
run;

Step 2:

proc format;
   value relabel  1="(1) Unknown"
     2="(2) 0"
     3="(3) 0-20"
     4="(4) 20-50"
     5="(5) 50-100"
     6="(6) 100-150"
     7="(7) 150-200"
     8="(8) 200-300"
     9="(9) 300-400"
     10="(10) 400-500"
     11="(11) 500+"
      ;
run;

Step 3:

proc tabulate data=want s=[just=c] out=Want_final;
label month_diff='Number of Months';
class newvar  Z2 /missing order=internal;
var lak;
table newvar='', (Z2=''*CustomerID=''*n='' all='TOTAL');
format newvar relabel.;
run;    

It's not clear why you want an output data set from PROC TABULATE, but it was in your original program so I left it in the program.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 3872 views
  • 17 likes
  • 7 in conversation