BookmarkSubscribeRSS Feed
s_lassen
Meteorite | Level 14

You have already had quite a lot of advice. However, there is one other reason checking with a data step may be the faster way - say we have a lot of variables, but only want to check sorting on a few of them, then you can use KEEP= to minimize the data that is actually read.

 

May save quite a lot of memory, and possibly also some time, e.g.:

%let sorted=NO;
data _null_;
  do until(eof);
    set sashelp.class(keep=age) end=eof;
    by age;
    end;
  call symputx('SORTED','YES');
run;

%put &=sorted;

I put the DO UNTIL in because it saves the code to reset the variables to missing before reading each observation.

ballardw
Super User

A quick test with a data set of 100,000,000 observations and testing time between @s_lassen's sort check and Proc sort execution:

2194  data work.sorttest;
2195    do id= 1 to 1E7;
2196      do order= 1 to 10;
2197         value= rand('uniform');
2198         output;
2199      end;
2200    end;
2201  run;

NOTE: The data set WORK.SORTTEST has 100000000 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           11.65 seconds
      user cpu time       4.42 seconds
      system cpu time     1.35 seconds
      memory              391.90k
      OS Memory           42016.00k
      Timestamp           09/11/2024 02:27:34 PM
      Step Count                        122  Switch Count  0


2202  options fullstimer;
2203  %let sorted=NO;
2204  data _null_;
2205    do until(eof);
2206      set work.sorttest (keep=id order) end=eof;
2207      by id order;
2208      end;
2209    call symputx('SORTED','YES');
2210  run;

NOTE: There were 100000000 observations read from the data set WORK.SORTTEST.
NOTE: DATA statement used (Total process time):
      real time           12.31 seconds
      user cpu time       10.96 seconds
      system cpu time     0.68 seconds
      memory              553.53k
      OS Memory           42016.00k
      Timestamp           09/11/2024 02:27:47 PM
      Step Count                        123  Switch Count  0


2211
2212  %put &=sorted;
SORTED=YES
2213
2214  proc sort data=work.sorttest;
2215     by id order;
2216  run;

NOTE: There were 100000000 observations read from the data set WORK.SORTTEST.
NOTE: The data set WORK.SORTTEST has 100000000 observations and 3 variables.
NOTE: PROCEDURE SORT used (Total process time):
      real time           41.89 seconds
      user cpu time       29.75 seconds
      system cpu time     5.11 seconds
      memory              1064875.09k
      OS Memory           1104732.00k
      Timestamp           09/11/2024 02:28:29 PM
      Step Count                        124  Switch Count  0

Proc sort would likely run slower with lots of variables as more data swapping goes on, BUT in that case TAGSORT would likely reduce the time to something similar to shown.

YMMV if 1) the data is actually in an external database 2) sorting data over a network connection as multiple load and bandwidth factors come into play

 

The time difference I see here is likely less than getting a cup of coffee in a near-by break room. With data in this order of magnitude I might say just use proc sort and get a cup of coffee (or tea or ...).

 

So how "huge" is "huge"? As in number of observations and variables?

 

Ksharp
Super User
You could create an index on this variable CustID . Like:
proc sql;
create index CustID on have;
quit;

Once this index created , that means the table has been sorted by CustID, no need PROC SORT any more.
Sarath_A_SAS
Obsidian | Level 7

 

Using PROC DATASETS to check the SORTEDBY attribute:

SAS stores metadata on whether a dataset has been sorted using the SORTEDBY attribute. You can check this metadata with PROC DATASETS.

Capture.PNG

 

Copy code

In the output, if the sortedby variable has a value, it indicates that the dataset is sorted by the corresponding variables. If the dataset is not sorted, the sortedby variable will be blank.

 

Reeza
Super User
This is essentially the same answer as PROC SORT, which doesn't answer the problem (later defined in the thread), how to detect a data set that is in order but hasn't been explicitly sorted to have the sort flag in place. Otherwise, this is definitely a correct solution.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 20 replies
  • 1165 views
  • 15 likes
  • 9 in conversation