DATA Step, Macro, Functions and more

PROC SORT with categorical variables

Reply
Contributor
Posts: 71

PROC SORT with categorical variables

I would like to sort a dataset using PROC SORT based on a character column.  Something equivalent to:

 

PROC SQL;
  SELECT *
  FROM foo
  ORDER BY
    CASE some_text_column
    WHEN 'category a' THEN 1
    WHEN 'category b' THEN 2
    WHEN 'category c' THEN 3
  END;
QUIT;
Respected Advisor
Posts: 2,647

Re: PROC SORT with categorical variables

Posted in reply to tomcmacdonald

Since you mention PROC SORT, have you tried it? It should get the job done very easily. Does it work? Why or why not?

 

Is the real question not about sorting but assigning values 1 2 3??

--
Paige Miller
Contributor
Posts: 71

Re: PROC SORT with categorical variables

Posted in reply to PaigeMiller

I'm just using dummy values.  The values won't be alphabetical.  I have tried it and it sorts alphabetically which is incorrect.  I'm guessing I'm going to have make a temporary column in a previous DATA STEP, sort it, then remove the column in another DATA STEP.

Respected Advisor
Posts: 2,647

Re: PROC SORT with categorical variables

Posted in reply to tomcmacdonald

@tomcmacdonald wrote:

I'm just using dummy values.  The values won't be alphabetical.  I have tried it and it sorts alphabetically which is incorrect. 


There is not enough information here for me to understand what is not correct, and what result you want.

--
Paige Miller
Contributor
Posts: 71

Re: PROC SORT with categorical variables

[ Edited ]
Posted in reply to PaigeMiller

Just seeing if there's a less wordy way of doing the equivalent PROC SQL operation.  I think I have a solution.  This is the test dataset:

 

 

data foo;
	input a b $;
	datalines;
1 apple
2 banana
3 pear
4 peach
5 orange
;
run;

 

This is the PROC SQL sort solution:

 

 

proc sql;
	select *
	from foo 
	order by
		case b
		when 'peach' then 1
		when 'orange' then 2
		when 'apple' then 3
		when 'pear' then 4
		when 'banana' then 5;
quit;

 

This is the wordy DATA STEP and PROC SORT solution:

 

 

data _1;
	set foo;
	if b = 'peach' then dummy = 1;
	if b = 'orange' then dummy = 2;
	if b = 'apple' then dummy = 3;
	if b = 'pear' then dummy = 4;
	if b = 'banana' then dummy = 5;
run;

proc sort data=_1; by dummy; run;

data foo(keep=a b);
	set _1;
run;

proc delete data=_1; run;
Respected Advisor
Posts: 2,647

Re: PROC SORT with categorical variables

Posted in reply to tomcmacdonald
proc sort data=_1 out=foo(drop=dummy); by dummy; run;
--
Paige Miller
Super User
Posts: 13,018

Re: PROC SORT with categorical variables

Posted in reply to tomcmacdonald

@tomcmacdonald wrote:

I'm just using dummy values.  The values won't be alphabetical.  I have tried it and it sorts alphabetically which is incorrect.  I'm guessing I'm going to have make a temporary column in a previous DATA STEP, sort it, then remove the column in another DATA STEP.


Show the code you used that resulted in an incorrect sort order for numeric variables.

 

Also indicate what format the numeric variable(s) that sort incorrectly may have. Note that if you have a format permanently applied to the variables the displayed value may not match the underlying numeric value.

Super User
Posts: 6,536

Re: PROC SORT with categorical variables

[ Edited ]
Posted in reply to tomcmacdonald

This will take at least two steps (just can't be done in one step).  To get a grasp on the problem ...

 

Do you know what 1, 2, and 3 are, or does the program have to figure out positionally what the names of the fields are?

 

Is FOO the name of the data set that contains both the data to be sorted as well as SOME_TEXT_COLUMN?

 

Wherever it comes from, is SOME_TEXT_COLUMN constant across records (or is there only one record to begin with)?

 

 

*********** EDITED:

 

Your example makes things much clearer.  You could combine a couple of steps:

 

proc sort data=_1 out=foo (keep=a b);

by dummy;

run;

 

But the DATA step approach will likely take longer to run than the original.

 

 

PROC Star
Posts: 1,300

Re: PROC SORT with categorical variables

Posted in reply to tomcmacdonald

may be a hash data step equivalent?

 

data foo;
	input a b $;
	datalines;
1 apple
2 banana
3 pear
4 peach
5 orange
;
run;

data _null_;
if _n_=1 then do;
  dcl hash H (ordered: "A") ;
   h.definekey  ("dummy") ;
   h.definedata ("a", "b") ;
   h.definedone () ;
end;
set foo end=last;
if b = 'peach' then dummy = 1;
else if b = 'orange' then dummy = 2;
else if b = 'apple' then dummy = 3;
else if b = 'pear' then dummy = 4;
else if b = 'banana' then dummy = 5;
h.replace();
if last then h.output(dataset:'want');
run;
Super User
Posts: 22,827

Re: PROC SORT with categorical variables

Posted in reply to tomcmacdonald

Can you not do this in the first place?

Replace the categorical variables with a number that would sort accordingly and have a format that shows the names instead. 

 

I'm not aware of another method to do this in any language tbh. 

Contributor
Posts: 71

Re: PROC SORT with categorical variables

Wouldn't that be more complicated because now you need to go into PROC FORMAT and define a format that will only be used once?
Super User
Posts: 22,827

Re: PROC SORT with categorical variables

Posted in reply to tomcmacdonald

It speaks to the structure of your data in the first place. 

 

Doing it from the start means your data is smaller and easier to manage.

It means when I'm coding, I'm typing less characters. 

PROC FORMAT can be driven from a data set so the code is centralized. 

 

In many ways, it's just better programming approach overall. 

 

PROC FORMAT is also very fast for lookups and I have more control over my reports. 

 

Obviously my opinion, but I work with SAS/R/Python and this is one of the best approaches in SAS, IMO

Super User
Posts: 13,018

Re: PROC SORT with categorical variables

Posted in reply to tomcmacdonald

In addition to @Reeza's format comments it may be one of the easier ways to reorder data depending on what you are actually using the data for. Note that reporting procedures Report and Tabulate explicitly have order options for variables and include Formatted as one of the options.

Super User
Posts: 10,611

Re: PROC SORT with categorical variables

Posted in reply to tomcmacdonald

Since ORDER BY is executed at last. 

Maybe you should try .

 

PROC SQL;
  SELECT *,
CASE some_text_column WHEN 'category a' THEN 1 WHEN 'category b' THEN 2 WHEN 'category c' THEN 3 END AS XXXX
FROM foo ORDER BY XXXX ; QUIT;
Ask a Question
Discussion stats
  • 13 replies
  • 172 views
  • 0 likes
  • 7 in conversation