BookmarkSubscribeRSS Feed
sas-inquirer
Quartz | Level 8

Good morning SAS Gurus,

    I am having trouble ending up with a custom sort order in SAS 9.4. I have combined two datasets. The first dataset contained a custom sort order that needs to be maintained. The second dataset contained new observations that needed to be added to the first dataset.

 

After combining the datasets, the table looks like this (where new obs have SORT_ORDER = .):

data have;
input SORT_ORDER KEY_VAR $;
CARDS;
1 1-2-9
2 1-2-4
4 1-2-5
5 1-2-8
. 1-1-5
. 1-2-7
. 1-3-1
. 1-3-3
;
run;

I can't sort by SORT_ORER because the new observations will be out of order based on the KEY_VAR. I can't sort by the KEY_VAR because it will disrupt the original sort order. I want to end up with this:

data want;
input SORT_ORDER KEY_VAR $;
CARDS;
. 1-1-5
1 1-2-9
2 1-2-4
4 1-2-5
. 1-2-7
5 1-2-8
. 1-3-1
. 1-3-3
;
run;

If I was doing this manually, I would take each new observation and insert it based on the KEY_VAR numeric order.

Where the logic would be if SORT_ORDER = . then insert based on KEY_VAR. 

 

I hope this makes sense. Any suggestions would be greatly appreciated.

19 REPLIES 19
ballardw
Super User

Missiing values make very poor controllers of order in general. So why is "missing" used for order?

 

It may help to provide what the output of your example should actually look like.

 

And are there any other variables in your data that need to be considered? If so, how?.

sas-inquirer
Quartz | Level 8
Hi Ballardw

You're right, missing values are very poor controllers. I've never had to deal with this type of issue before. Once I can figure out the sort order issue, I will redo the sort order variable so that each observation has a value.

The output will be:
NEW_SORT_ORDER KEY_VAR
1 1-1-5
2 1-2-9
3 1-2-4
4 1-2-5
5 1-2-7
6 1-2-8
7 1-3-1
8 1-3-3

No other variables need to be considered for the purpose of sorting. There are many other variables but they are directly associated with the KEY_VAR (i.e. first digit represents something, second something else etc.) The KEY_VAR is unique.

The problem is that the original table represents a report that the user wants ordered in a specific way. My role is to update this report with some new KEY_VAR's but maintain any sort order that existed in the original report. The new KEY_VAR's are to be inserted based on their KEY_VAR numeric order without disrupting the original sort order.

Perhaps some context may help. Let's say the report represents grocery items from a store and each item is assigned a unique KEY which represents the manufacturer, the country and the type of food. The new KEY_VARs are new items that the store is offering. Management is very interested in ORANGES and so is used to seeing the report with ORANGES (1-2-9) followed by BANANAS (1-2-4) followed by APPLES (1-2-8) and so a SORT_ORDER variable was created. Unfortunately, the KEY_VAR cannot be changed. The store now offers APRICOTS and has assigned the key (1-2-7). Now the sort order should be ORANGES, BANANAS, APPLES, APRICOTS so the sort is giving priority to the SORT_ORDER first and then the KEY_VAR if cases where the item is new.
sas-inquirer
Quartz | Level 8
Sorry, correction in last paragraph: Now the sort order should be ORANGES, BANANAS, APRICOTS, APPLES.
PGStats
Opal | Level 21

I don't quite understand your rule for inserting the new records. This would seem more logical:

 

data have;
input SORT_ORDER KEY_VAR $;
CARDS;
1 1-2-9
2 1-2-4
4 1-2-5
5 1-2-8
. 1-1-5
. 1-2-7
. 1-3-1
. 1-3-3
;

proc sort data=have out=have1; by key_var; run;

data have2;
retain last_order;
set have1;
temp_sort_order = sort_order;
if missing(temp_sort_order) then temp_sort_order = last_order;
else last_order = sort_order;
drop last_order;
run;

proc sort data=have2 out=have3; by temp_sort_order key_var; run;

data want;
set have3;
new_sort_order = _n_;
run;

proc print data=want noobs; run;

PGStats_0-1621704480423.png

If not, please explain.

 

 

PG
sas-inquirer
Quartz | Level 8
@PGStats. This is certainly the closest a solution has come. If the 1-3-1 and 1-3-3 fell in position 7 and 8 under new_sort_order it would be perfect, but I'm realizing this custom sort is not going to be possible. Thank you for your suggestion and time.
FreelanceReinh
Jade | Level 19

Hello @sas-inquirer,

 

Many non-standard sort orders can be implemented using a suitable ORDER BY clause in a PROC SQL step. A fundamental (mathematical) prerequisite of any of these sort orders is a definition satisfying the properties of a so called linear order (cf. https://en.wikipedia.org/wiki/Total_order). In particular, for any two different records to be sorted there must be no ambiguity as to which one takes precedence of the other. For example, you suggest

SORT_ORDER KEY_VAR
.          1-2-7
5          1-2-8

because you "insert" the new records "based on KEY_VAR." But at the same time you want 

SORT_ORDER KEY_VAR
1          1-2-9
.          1-2-7

which seems to follow a different "logic."

 

Can you provide a rule that is applicable to any two (SORT_ORDER, KEY_VAR) pairs?

sas-inquirer
Quartz | Level 8
The rule is SORT_ORDER takes precedence over KEY_VAR.
ballardw
Super User

Still don't know what the desired output should look like.

You mentioned a "report".

If your report is generated with Proc Report or Tabulate it might be possible that PRELOADFMT can control appearance.

The steps would be making a custom format with the option NOTSORTED and placing the values in the desired order. Then using the PRELOADFMT option for the variable along with other options depending on use to help it work.

An example:

proc format;
value orderexample (notsorted)
23 = '23'
1  = '1'
18 = '18'
5  = '5'
;
run;

data example;
   input inorder value;
datalines;
1  45
5  16
18 11
23 0
;

proc report data=example;
   columns inorder value;
   define inorder /group order=data preloadfmt;
   format inorder orderexample.;
run;

The Order=data is one of the helper options but note that the input data order is not the order the report shows for the variable INORDER. Only a few procedures support this option but may be helpful. In proc report the column would have to be a Group variable.

FreelanceReinh
Jade | Level 19

@FreelanceReinh wrote:

In particular, for any two different records to be sorted there must be no ambiguity as to which one takes precedence of the other.

(...)

Can you provide a rule that is applicable to any two (SORT_ORDER, KEY_VAR) pairs?



@sas-inquirer wrote:
The rule is SORT_ORDER takes precedence over KEY_VAR.

The rule must refer to values, not only variable names.

 

Example:

Suppose that this is an existing record in the original table:

SORT_ORDER KEY_VAR
42         3-5-7

Now this new record is to be inserted:

SORT_ORDER KEY_VAR
.          3-5-11

A rule for a well-defined sort order based on SORT_ORDER and KEY_VAR must decide whether this new record would appear (somewhere) before or (somewhere) after the above existing record after insertion.

sas-inquirer
Quartz | Level 8
@FreelanceReinhard
Right, the rule then would be any new record should be inserted in numerical order so in your example, 3-5-11 would be inserted after 3-5-7.
FreelanceReinh
Jade | Level 19

@sas-inquirer wrote:
@FreelanceReinhard
Right, the rule then would be any new record should be inserted in numerical order so in your example, 3-5-11 would be inserted after 3-5-7.

Thanks. Unfortunately, the rules established so far violate the transitivity condition and hence do not define a linear order (see the Wikipedia article I linked earlier).

 

Consider these three (SORT_ORDER, KEY_VAR) pairs:

SORT_ORDER KEY_VAR
1          1-1-3
2          1-1-1
.          1-1-2

We have

Therefore it is impossible to sort these three records in accordance with the rules. Each of their six permutations violates at least one of the two rules.

sas-inquirer
Quartz | Level 8
@FreelanceReinhard - you are right. This is not a logical rule. I need to go back to my client and convince them to either create a new order with the new data included else accept a numerical order. Thanks so much for your time.
sas-inquirer
Quartz | Level 8

I think another approach may be to ignore the whole sort issue and take a step back before I even join the tables together.

Table 1 is the original table:

data have_1;
input SORT_ORDER KEY_VAR $;
CARDS;
1 1-2-7
2 1-2-4
3 1-2-5
4 1-2-8
5 1-2-9
;
run;

Table 2 is a new KEY_VAR that needs to be added into the original table:

data have_2;
input KEY_VAR $;
CARDS;
1-2-3
;
run;

How can I insert the new KEY_VAR (1-2-3) from table have_2 before KEY_VAR 1-2-4 without moving any other observations from Table have_1.

data want_1;
input SORT_ORDER KEY_VAR $;
CARDS;
1 1-2-7
. 1-2-3
2 1-2-4
3 1-2-5
4 1-2-8
5 1-2-9
;
run;

So I insert the new var (1-2-3) based on the logic that 1-2-3 comes before 1-2-4. But I don't want to move the first obs (1-2-7) from its current position so I end up with:

data want_final;
input SORT_ORDER KEY_VAR $;
CARDS;
1 1-2-7
2 1-2-3
3 1-2-4
4 1-2-5
5 1-2-8
6 1-2-9
;
run;

I hope that makes more sense.

 

 

 

ballardw
Super User

What in the data tells that 1-2-3 should come before 1-2-4 but not before 1-2-7???

 

And just how does this relate to your "report" at all? If there are other variables involved then perhaps examining how the REPORT is made is the important part.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 19 replies
  • 2773 views
  • 7 likes
  • 5 in conversation