BookmarkSubscribeRSS Feed
0 Likes

I recently discovered the presorted option in proc sort; it has saved me a lot of time in verifying whether very large datasets have been staged with the correct sort order as part of quality assurance requirements. However, getting the verification into a data set means parsing an external log file for the proc's NOTE. It would amazing if there was an automatic variable that captured whether a data set was presorted (1) or not (0) when the presorted option is used.

4 Comments
Patrick
Opal | Level 21

You can get the information you're after from dictionary.tables or via a Proc Contents. 

Run below sample code and inspect both the reports created and the SAS log.

data work.demo_1;
  set sashelp.class;
run;

data work.demo_2a (sortedby=name) work.demo_2b (sortedby=age);
  set sashelp.class;
run;

proc sort data=sashelp.class out=demo_3;
  by name;
run;

proc sort data=sashelp.class out=demo_4 presorted;
  by name;
run;

proc sql;
  select 
    libname,
    memname,
    sortname,
    sorttype,
    sortchar
  from dictionary.tables
  where
    libname='WORK'
    and memname like 'DEMO%'
  ;
quit;

proc contents data=demo_2b;
run;
proc contents data=demo_3;
run;


proc sort data=demo_1;
  by name;
run;

proc sort data=demo_2a;
  by name;
run;

proc sort data=demo_2b;
  by age;
run;

proc sort data=demo_3;
  by name;
run;

proc sort data=demo_4;
  by name;
run;

If you also want to pull out the variable sort order then you need another to join with dictionary.columns. Column SORTEDB gives you the sort order of the variables in a data set.

 

Patrick
Opal | Level 21

To add to above: It appears that for an unverified sort (=set via the sortedby option) Proc Sort just will not attempt to sort. Which of course for the demo_2b table is sub-optimal (as the data is actually not sorted by age).

I've added presorted to the proc sort and it appears that in such a case Proc Sort will verify the sort order and actually sort the data if there is a mismatch. And after this the sort the order will be verified.

72         proc sort data=demo_2b;
73           by age;
74         run;

NOTE: Input data set is already sorted, no sorting done.
NOTE: PROCEDURE SORT used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
      

75         
76         proc sort data=demo_2b presorted;
77           by age;
78         run;

NOTE: Input data set is not in sorted order.
NOTE: There were 19 observations read from the data set WORK.DEMO_2B.
NOTE: The data set WORK.DEMO_2B has 19 observations and 5 variables.
NOTE: PROCEDURE SORT used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds
      

79         
80         proc sort data=demo_2b presorted;
81           by age;
82         run;

NOTE: Input data set is already sorted, no sorting done.
NOTE: PROCEDURE SORT used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
      

I'm using a very recent SAS 9.4 maintenance release. I can be wrong but I believe that proc sort in older SAS versions always sorted unverified sorts. I prefer how SAS behaves now as this gives us more options - and we can always use presorted. So thanks to remind me of presorted and its usefulness.

jvigeant
Fluorite | Level 6

Hi Patrick, 

 

Thank you so much for that information. I keep coming back to the sort indicators in the dictionary tables hoping that something will click for me, but I find myself hitting a brick wall. I work in a distributed database environment, so the quality assurance checks I'm programming will be run offsite on data sets I have almost no control in creating (each site writes/executes its own ETL code in SAS). So there is a strong possibility for the scenario I coded out below (a site could write to a staged area where data is sorted, then move the ds to a prod area for QA):

 

%macro checkSort(lib,mem);
  proc sql;
    select memname, sorttype
    from dictionary.tables
    where upcase(libname) = upcase("&lib.") AND
          upcase(memname) = upcase("&mem.")
    ;
    select name, sortedby
    from dictionary.columns
    where upcase(libname) = upcase("&lib.") AND
          upcase(memname) = upcase("&mem.")
    ;
  quit;
%mend; 

* run CheckSort macro on cars to get sort status of base data set;
%checksort(sashelp,cars);

*** NOTE: sortType = S, sort is verified in baseline;

data cars2;
set sashelp.cars;
run;

* Run CheckSort macro against work.cars2;
* Note not change in sort order from sashelp.cars;
%checksort(work,cars2);

*** NOTE: sortType = <missing>, no sort verification exists;

* run proc sort with presorted option to verify sort order;
proc sort data = cars2 presorted;
by make type; 
run;

*** NOTE: log message Sort order of input data set has been verified;

TLDR version: Why will proc sort with the presorted option positively verify the sort order when the usual channels (sort indicators) will not? 

Casey_SAS
SAS Employee
Status changed to: Under Consideration

Hey jvigeant - thanks for the idea - I'm working with Product Management and will update you on the viability of this suggestion.....