BookmarkSubscribeRSS Feed
shanesheu
Calcite | Level 5

Hi SAS communities,

I am trying to find a way to access and check the sort order of datasets, which I believed is stored as "metadata" that contains information on the attributes of datasets (e.g. sort order, number of variables...etc). So far, I have not had much success. The following are two ways that I've come up with so far, but they don't seem to do exactly what I need:

1)   1) There’s an option for Proc Sort called “presorted”, which checks through the dataset to see if any one of the observations are out of the specified sort order in the "By" statement. If it is sorted properly, no sorting will be done. Otherwise sorting will be done. The downside about this is that I don’t think it produces an indicator of some sort, and therefore if I have 60 partitions of one big dataset, every single one has to go through the proc sort presorted process. If one of the partitions is already sorted correctly, and all partitions are generated in the same process, then there's no point putting every partition through the process. Not very efficient.

      2) I believe there are “dictionaries” that can only be accessed by proc sql, e.g. dictionary.tables or dictionary.columns, that tells you about the properties of the datasets, including the sort orders. From my research so far, it seems that the dictionary contains info for the active SAS session, and not for particular datasets. I might be wrong about this.

     

      It would be great if someone can point me in the right way. Thanks in advance!

      Shane

11 REPLIES 11
Haikuo
Onyx | Level 15

As far as I know, you can use those metadata tables to get what you want:

proc sql;

select sortedby from dictionary.columns

where libname='YOURLIBNAME' AND MEMNAME='YOURTABLE';

QUIT;

if sortedby is 0 then it is not presorted by proc sort or sql, otherwise, sortedby=1.

Regards,

Haikuo

shanesheu
Calcite | Level 5

I did a little investigation and found this (regarding Haikuo's suggestion):

Let's say you are trying to merge two datasets by id and study_entry_date.

So you proc sort those two datasets by those two variables, and check their dictionary.columns: sortedby=1 for id, and sortedby=2 for study_entry_date in each of the two datasets, so ok, great.

Now, if you are to merge those two dataset through a data step by those two variables, when you access the dictionary.columns for the output dataset, the sortedby values for all variables have become 0.

This suggests that, even if the resulting output is sorted correctly in the ordered specified, using dictionary.columns' sortedby to check whether the datasset is sorted properly still seems inadequate. It will wrongly label the correctly sorted output as unsorted.

This is why I wish proc sort can produce an indicator that tells you whether the dataset is sorted properly by the variables specified in the by statement, and then use that indicator as a switch for sorting the remaining partitions.

I'm currently looking into FriedEgg's suggestion, but it looks like sortedby is used as well. Anyways, I'll give it a try and see what happens.

Thanks

art297
Opal | Level 21

Are you trying to avoid sorting or just make sure that the sort is accomplished?  If it is the latter, and you will be using proc sort, just use the FORCE option.

There is NO guarantee that the metadata is correct!

Haikuo
Onyx | Level 15

Like I said, sortedby only indicates whether the CURRENT table has been sorted by Proc sort or SQL order by.

Even though both of your input tables are proc sorted, and your output table from Merge is supposed to be sorted as well, however, your output table has NOT been officially sorted by Proc Sort or SQL order by.

So basically you are looking for a indicator that can be flaged by proc sort, and somehow can be transplanted to the new table after the Merge. In your case, if you are positive that your table has been sorted, you can manually flag it by using data step option (sortedby=yourvariablename), while SAS has to sort it before it can know it is sorted.

FriedEgg
SAS Employee

data merge(sortedby=prim_id sec_id);

  merge tab1(in=b) tab2(in=a);

  by prim_id sec_id;

  if a and b;

run;

shanesheu
Calcite | Level 5

In response to art297, yes I am trying to avoid sorting if the data is already sorted correctly. This brings me to my response for Haikuo: sorry for being unclear, I forgot to state that I am trying to avoid proc sort or SQL order by if the data is already sorted correctly.

I did a quick investigation and indeed, the sortedby option for data step seems to flag the sort order without putting the data through proc sort or SQL order by. I am going to see how this fits into the macro I am writing. Thanks for the example as well, FriedEgg.

I'll return when I get this working or if I need further help. Will remember to award credits for answers once all are well.

Thanks a bunch guys, happy Friday.

Astounding
PROC Star

If you add the sortedby= option, you better be right!  SAS will not check the sorted order for you.  For example:

proc sort data=mydata;

     by name;

run;

data mydata2 (sortedby=amount);

     set mydata;

run;

proc sort data=mydata2;

     by amount;

run;

The DATA step runs without error, even though the data are not in order.  In addition, SAS skips the second PROC SORT, because it believes the data are in order.  In most cases where sortedby= appears, the data are sorted.  But there is no guarantee, so like I said, you better be right.  Even this MERGE is legal:

data combine;

     merge a b;

     by name;

     if amount > 5 then name='Fred';

run;

Even though the data set is created when you MERGE BY NAME, the final data set is not in order by NAME.  Of course this doesn't happen very often.  But are the chances greater than zero?  You decide.

Good luck.

shanesheu
Calcite | Level 5

You are definitely right (after I tested out what you described). I guess art297 was right to say that there's no guarantee that the metadata is correct. I'll continue to experiment and do some research on my own for now. Thanks a lot everyone.

Astounding
PROC Star

shanesheu,

Some of the metadata can help.  When you run a PROC CONTENTS, there are two related pieces of information:  SORTEDBY and VALIDATED.  I imagine the metadata contains the same information.

SORTEDBY is the sorted order.

VALIDATED is a flag that indicates whether the sorted order is guaranteed (meaning the sorting was performed by either PROC SORT or PROC SQL).

Just in case it helps ...

FriedEgg
SAS Employee

proc sql;

create table class as

select * from sashelp.class

order by sex,name;

quit;

%macro isSorted(libname=WORK ,memname= ,by=);

/* libname=libref where data to check resides, default=work

     memname=data set name to check

    by=space delimited list of by variables is expected order */

proc format;

value tf

  0 = 'not sorted'

  1 = 'sorted by:'

  2 = 'improperly sorted by:';

quit;

%global isSorted;

proc sql noprint;

select case

         when sum(sortedby)>0 then 1

         else 0

        end,

        name

   into :isSorted,:columnName1-:columnName100

   from sashelp.vcolumn where libname="%upcase(&libname)" and memname="%upcase(&memname)" and sortedby>0

  order by sortedby;

quit;

%let cols=;

%if &isSorted>0 %then %do;

%let cols=%upcase(&columnName1);

%do i=2 %to &sqlobs;

  %let cols=&cols, %upcase(&&columnName&i);

%end;

%if &by>. %then %do;

  %let by=%upcase(%sysfunc(tranwrd(&by,%str( ),%str(, ))));

  %if &by ne &cols %then %let isSorted=2;

%end;

%end;

%put Dataset &memname in library &libname is %trim(%sysfunc(putn(&isSorted,tf.))) &cols;

%mend;

%isSorted(libname=WORK ,memname=CLASS ,by=name sex)

Macro Output to Log:

Dataset CLASS in library WORK is improperly sorted by: SEX, NAME

Macro Global Variable Output:

isSorted=1 if sorted, =0 if not sorted ,=2 if sorted in order other than requested

Ksharp
Super User

Or if you have maken an index for this variable firstly, then you will not pre-sort it any more at next code.

data class(index=(name));
 set sashelp.class;run;
data x;
 merge class(keep=name sex) class(keep=name weight height);
 by name;
run;


Ksharp

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 7777 views
  • 7 likes
  • 6 in conversation