BookmarkSubscribeRSS Feed
jvigeant
Fluorite | Level 6

 

Hi folks - long time listener, first time caller. 

Quick background: I work in a distributed environment where a number of stakeholders take their proprietary data and translate it into a common data model. We have minimal input on how the stakeholders write their ETL code, we are concerned with evaluating compliance to the common data model. One requirement is to verify the datasets are sorted as specified. 

1. Since we don't create the datasets to be evaluated, we (so I have read) can't trust that the metadata (proc contents or dictionary tables) would accurately capture sort information. 

2. These are huge tables we are evaluating. If I tried a proc sort with the presorted option to verify the sort order, and the data is not already sorted, I run the risk of sorting the data: resource intensive and NOT what I want. I only want to verify whether it is sorted so I can notify the stakeholder that they need to sort the data. 

proc sort data = in.ds presorted out = out.ds;
by varA varB varC;
run;

I'm looking for other suggestions outside of these. Can the sortedby= data step option be used to verify sort order, or is it only for creating sort order? 

Thanks much!

 

9 REPLIES 9
ballardw
Super User

Are you talking about data in your SAS system or residing in someone else's? Or data extracted from your data sets?

 

Please provide the source of the information that Proc contents or dictionary tables can't report the sort status correctly. My understanding is that information is in the data set "header" information and should be there regardless of "who" created the data.

jvigeant
Fluorite | Level 6

I'll refer to the following thread as my very informal source: https://communities.sas.com/t5/SAS-Procedures/Accessing-checking-sort-order-in-a-dataset/m-p/29860#M...

 

The issue of who creates the data only factors in because I don't know how it is created; there is no standard for the various stakeholders other than the results have pass a compliance audit that I am writing. 

AhmedAl_Attar
Rhodochrosite | Level 12

If Proc contents not an option, how about a dummy data step to test the desired sort order?

data _null_;
	set sashelp.prdsale;
	by year product;
run;

ERROR: BY variables are not properly sorted on data set SASHELP.PRDSALE.
ACTUAL=$561.00 PREDICT=$979.00 COUNTRY=CANADA REGION=EAST DIVISION=EDUCATION PRODTYPE=FURNITURE
PRODUCT=SOFA QUARTER=4 YEAR=1994 MONTH=Dec FIRST.YEAR=0 LAST.YEAR=0 FIRST.PRODUCT=0 LAST.PRODUCT=0
_ERROR_=1 _N_=24
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 25 observations read from the data set SASHELP.PRDSALE.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds

The step would fail at the first time, and you would know for sure whether or not the data is sorted.

 

Just a thought,

Ahmed

jvigeant
Fluorite | Level 6

This is a really good approach... I need to flesh it out a little bit regarding capturing the error information as a variable and applying it to downstream logic, but I think the payoff is that it gives a very instantaneous answer without much overhead. 

Thanks!

Astounding
PROC Star

Agreed that this is a good approach ... here are a couple of tools you might want to add to the program.

 

After each DATA _NULL_ step, use macro language to capture the value of &SYSERR and write a message.

 

Once you have the process working for one data set, set the global option OPTIONS NOSYNTAXCHECK;

That way, your program can continue with checking another data set even if an error was encountered because of an unsorted data set.

Always check your logic thoroughly before using this option!.

AhmedAl_Attar
Rhodochrosite | Level 12

@jvigeant 

You will  need to check the value of these two system macro variables to check for Error Status & Error Text

%put &=SYSERR; 
%put &=SYSERRORTEXT;
PGStats
Opal | Level 21

The sortvalidate system option can be used for this purpose. With this option (which is not the default) the sort procedure effectively sorts the data only if a sequence check fails.

PG
Patrick
Opal | Level 21

You can trust SAS metadata (the descriptor portion of a SAS data set) IF the sort is also validated.

Patrick_0-1593137316724.png

It is possible though that a data set is sorted without metadata stored that it's sorted (table sort2 in below sample code).

 

I'd be using a check via metadata first as this will perform much better than anything else. 

Below some sample code for you to build upon.

/* create sample data 
  cheat:  Not sorted with Metadata for Sorted but not Validated
  sort:   Sorted with Metadata for Sorted and Validated
  sort2:  Sorted but no Metadata
*/
data cheat(sortedby=id name) sort;
  set sashelp.class;
  id=ceil(ranuni(1)*100);
run;
proc sort data=sort out=sort;
  by id name;
run;
data sort2;
  set sort;
run;


%macro CheckSort(ds,sortvars);
  proc contents 
    data=&ds 
    out =_check(keep=libname memname name sorted sortedby)
    noprint;
  run;quit;

  %local sortvars_meta;
  %let sortvars_meta=;
  proc sql ;
    select upcase(name) into :sortvars_meta separated by ' '
    from _check
    where 
      sorted=1 /* 1 if sort validated */
      and not missing(sortedby) /* sequence of sort vars */
    order by sortedby
    ;
  quit;

  %if &sortvars_meta = %upcase(&sortvars) %then %put DS &DS is sorted;
  %else 
    %do;
      %put Verify sort order for DS &DS;
      data _null_;
        set &ds(keep=&sortvars);
        by &sortvars;
      run;
    %end;

%mend;

%CheckSort(sort,id name);
%CheckSort(sort2,id name);
%CheckSort(cheat,id name);

If time then consider to implement the data _null_ check step in a way that allows you to capture/reset the error if the source table isn't properly sorted so that it doesn't impact on further tests.

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 989 views
  • 5 likes
  • 7 in conversation