Good morning SAS Gurus,
I am having trouble ending up with a custom sort order in SAS 9.4. I have combined two datasets. The first dataset contained a custom sort order that needs to be maintained. The second dataset contained new observations that needed to be added to the first dataset.
After combining the datasets, the table looks like this (where new obs have SORT_ORDER = .):
data have;
input SORT_ORDER KEY_VAR $;
CARDS;
1 1-2-9
2 1-2-4
4 1-2-5
5 1-2-8
. 1-1-5
. 1-2-7
. 1-3-1
. 1-3-3
;
run;
I can't sort by SORT_ORER because the new observations will be out of order based on the KEY_VAR. I can't sort by the KEY_VAR because it will disrupt the original sort order. I want to end up with this:
data want;
input SORT_ORDER KEY_VAR $;
CARDS;
. 1-1-5
1 1-2-9
2 1-2-4
4 1-2-5
. 1-2-7
5 1-2-8
. 1-3-1
. 1-3-3
;
run;
If I was doing this manually, I would take each new observation and insert it based on the KEY_VAR numeric order.
Where the logic would be if SORT_ORDER = . then insert based on KEY_VAR.
I hope this makes sense. Any suggestions would be greatly appreciated.
Missiing values make very poor controllers of order in general. So why is "missing" used for order?
It may help to provide what the output of your example should actually look like.
And are there any other variables in your data that need to be considered? If so, how?.
I don't quite understand your rule for inserting the new records. This would seem more logical:
data have;
input SORT_ORDER KEY_VAR $;
CARDS;
1 1-2-9
2 1-2-4
4 1-2-5
5 1-2-8
. 1-1-5
. 1-2-7
. 1-3-1
. 1-3-3
;
proc sort data=have out=have1; by key_var; run;
data have2;
retain last_order;
set have1;
temp_sort_order = sort_order;
if missing(temp_sort_order) then temp_sort_order = last_order;
else last_order = sort_order;
drop last_order;
run;
proc sort data=have2 out=have3; by temp_sort_order key_var; run;
data want;
set have3;
new_sort_order = _n_;
run;
proc print data=want noobs; run;
If not, please explain.
Hello @sas-inquirer,
Many non-standard sort orders can be implemented using a suitable ORDER BY clause in a PROC SQL step. A fundamental (mathematical) prerequisite of any of these sort orders is a definition satisfying the properties of a so called linear order (cf. https://en.wikipedia.org/wiki/Total_order). In particular, for any two different records to be sorted there must be no ambiguity as to which one takes precedence of the other. For example, you suggest
SORT_ORDER KEY_VAR . 1-2-7 5 1-2-8
because you "insert" the new records "based on KEY_VAR." But at the same time you want
SORT_ORDER KEY_VAR 1 1-2-9 . 1-2-7
which seems to follow a different "logic."
Can you provide a rule that is applicable to any two (SORT_ORDER, KEY_VAR) pairs?
Still don't know what the desired output should look like.
You mentioned a "report".
If your report is generated with Proc Report or Tabulate it might be possible that PRELOADFMT can control appearance.
The steps would be making a custom format with the option NOTSORTED and placing the values in the desired order. Then using the PRELOADFMT option for the variable along with other options depending on use to help it work.
An example:
proc format; value orderexample (notsorted) 23 = '23' 1 = '1' 18 = '18' 5 = '5' ; run; data example; input inorder value; datalines; 1 45 5 16 18 11 23 0 ; proc report data=example; columns inorder value; define inorder /group order=data preloadfmt; format inorder orderexample.; run;
The Order=data is one of the helper options but note that the input data order is not the order the report shows for the variable INORDER. Only a few procedures support this option but may be helpful. In proc report the column would have to be a Group variable.
@FreelanceReinh wrote:
In particular, for any two different records to be sorted there must be no ambiguity as to which one takes precedence of the other.
(...)
Can you provide a rule that is applicable to any two (SORT_ORDER, KEY_VAR) pairs?
@sas-inquirer wrote:
The rule is SORT_ORDER takes precedence over KEY_VAR.
The rule must refer to values, not only variable names.
Example:
Suppose that this is an existing record in the original table:
SORT_ORDER KEY_VAR 42 3-5-7
Now this new record is to be inserted:
SORT_ORDER KEY_VAR . 3-5-11
A rule for a well-defined sort order based on SORT_ORDER and KEY_VAR must decide whether this new record would appear (somewhere) before or (somewhere) after the above existing record after insertion.
@sas-inquirer wrote:
@FreelanceReinhard
Right, the rule then would be any new record should be inserted in numerical order so in your example, 3-5-11 would be inserted after 3-5-7.
Thanks. Unfortunately, the rules established so far violate the transitivity condition and hence do not define a linear order (see the Wikipedia article I linked earlier).
Consider these three (SORT_ORDER, KEY_VAR) pairs:
SORT_ORDER KEY_VAR 1 1-1-3 2 1-1-1 . 1-1-2
We have
Therefore it is impossible to sort these three records in accordance with the rules. Each of their six permutations violates at least one of the two rules.
I think another approach may be to ignore the whole sort issue and take a step back before I even join the tables together.
Table 1 is the original table:
data have_1;
input SORT_ORDER KEY_VAR $;
CARDS;
1 1-2-7
2 1-2-4
3 1-2-5
4 1-2-8
5 1-2-9
;
run;
Table 2 is a new KEY_VAR that needs to be added into the original table:
data have_2;
input KEY_VAR $;
CARDS;
1-2-3
;
run;
How can I insert the new KEY_VAR (1-2-3) from table have_2 before KEY_VAR 1-2-4 without moving any other observations from Table have_1.
data want_1;
input SORT_ORDER KEY_VAR $;
CARDS;
1 1-2-7
. 1-2-3
2 1-2-4
3 1-2-5
4 1-2-8
5 1-2-9
;
run;
So I insert the new var (1-2-3) based on the logic that 1-2-3 comes before 1-2-4. But I don't want to move the first obs (1-2-7) from its current position so I end up with:
data want_final;
input SORT_ORDER KEY_VAR $;
CARDS;
1 1-2-7
2 1-2-3
3 1-2-4
4 1-2-5
5 1-2-8
6 1-2-9
;
run;
I hope that makes more sense.
What in the data tells that 1-2-3 should come before 1-2-4 but not before 1-2-7???
And just how does this relate to your "report" at all? If there are other variables involved then perhaps examining how the REPORT is made is the important part.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.