Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- Programming
- /
- Re: How to check if data set is sorted

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 09-04-2024 01:02 PM
(1264 views)

Let's say that I have a big data set and I want to know if this data set is sorted by variable CustID.

What is the way to check it?

I know that I can look at subset of few rows and see if it is sorted but it is still not perfect way because it might happen that in the next rows data set is not sorted

20 REPLIES 20

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You can run proc contents and check the SORTED output or use the SORT* variables in sashelp.vtable for a more dynamic approach if you want to check via code.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I run proc contents and it is written sorted No

```
proc contents data=csp.twoyrs_3;Run;
```

Howver I run code using by statemernt and didnt get any error so probably it is sorted!

How can I find that it is sorted???

Please note that I run this code and no have any error

```
data want;
pop7=0;
failure=0;
do until(last.LAKOACH);
set csp.twoyrs_3 end=eof;
by LAKOACH;
if (VALID_FROM_DTTM='01NOV2022'd and pop_lomeforat_wppr=7) then POP7=1;
if VALID_FROM_DTTM in ('01OCT2022'd,'01SEP2022'd ,'01AUG2022'd,'01JUL2022'd,'01JUN2022'd) and ind_shlili_6=1 then FAILURE=1;
end;
totalN+1; /***Total rows checked----2,493,500****/
totalpop7 + POP7; /****Base pop that is determined by belonging to pop 7 in month Nov 2022---108,552***/
totalpop7failure + (pop7 and failure);/***10,623***/
keep total: ;
if eof;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Do you mean whether the data set is **sorted**, meaning having been sorted previously by Proc Sort or Proc SQL?

Or do you mean whether the data set is in a particular **order**, regardless of having been sorted?

For example, this data set is ordered by ID. But is has not been sorted.

```
data have;
input ID x;
datalines;
1 10
2 20
3 30
;
```

As a consequence, running Proc Contents against the data like this

```
proc contents data = have;
run;
```

will tell you that the data is not sorted. Because it has not been sorted (nor validated) by SAS.

If what you are after is a *check *of whether the data is in some particular order, then the **Presorted Option** on the Proc Sort Statement is the way to go.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Good question.

When I say sorted I mean that values are sorted by values by ascending order. It doesn't mean that proc sort or sort via proc sql was done before. Maybe values are built in sorted without any further code thst done to sort it. My question was how to check if values in Data set are sorted ? Yes/no

When I say sorted I mean that values are sorted by values by ascending order. It doesn't mean that proc sort or sort via proc sql was done before. Maybe values are built in sorted without any further code thst done to sort it. My question was how to check if values in Data set are sorted ? Yes/no

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You kean that If the data set is already sorted then proc sort procedure will not be done again so it is efficient?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I know. If you need to sort after you check if the data is sorted and the answer is no, then you may as well sort in the first place.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@Ronein Ideally you know your data and don't need such a check.

As already mentioned by @PeterClemmensen early on Proc Sort has a presorted option that makes the Proc first check if data is already sorted.

```
data have;
set sashelp.class;
run;
proc sort data=have presorted;
by name;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You can let SAS generate an error for you.

Examples:

275 %let sorted=NO; 276 data _null_; 277 if eof then call symputx('sorted','YES'); 278 set sashelp.class end=eof; 279 by age; 280 run; ERROR: BY variables are not properly sorted on data set SASHELP.CLASS. eof=0 Name=Alfred Sex=M Age=14 Height=69 Weight=112.5 FIRST.Age=1 LAST.Age=1 _ERROR_=1 _N_=1 NOTE: The SAS System stopped processing this step because of errors. NOTE: There were 2 observations read from the data set SASHELP.CLASS. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.00 seconds 281 %put &=sorted; SORTED=NO 282 %let sorted=NO; 283 data _null_; 284 if eof then call symputx('sorted','YES'); 285 set sashelp.class end=eof; 286 by name; 287 run; NOTE: There were 19 observations read from the data set SASHELP.CLASS. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 288 %put &=sorted; SORTED=YES

Or if you want to test yourself then use LAG() function and avoid the ERROR in the LOG.

317 %let sorted=NO; 318 data _null_; 319 if eof then call symputx('sorted','YES'); 320 set sashelp.class end=eof; 321 if age < lag(age) then stop; 322 run; NOTE: There were 3 observations read from the data set SASHELP.CLASS. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 323 %put &=sorted; SORTED=NO 324 %let sorted=NO; 325 data _null_; 326 if eof then call symputx('sorted','YES'); 327 set sashelp.class end=eof; 328 if name < lag(name) then stop; 329 run; NOTE: There were 19 observations read from the data set SASHELP.CLASS. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 330 %put &=sorted; SORTED=YES

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

From an efficiency standpoint, which is faster - sorting ahead of time or checking with the data step. My intuition says sort since the procedures are optimized while a data step has to run through all records (unless the out of order data is early in the dataset)

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Depends on the goal.

The best sort procedures take O(NlogN) time and need at a minimum space to write two copies of the data (and for larger datasets even more for spill files).

Reading the dataset takes just O(N). Plus since it does not need to write any data it should run even faster. And when the data is not sorted it should on average fail before reading half of the data. But if the goal is to actually sort the data then all of that is wasted time except when it is already sorted. So if you expect the data to be sorted a large percent of the time and just want detect then perhaps it is worth doing.

Note also that PROC SORT should automatically detect when the dataset's metadata indicates it is sorted and not actually do anything in that case. So PROC SORT should also benefit when the data is sorted (and SAS knows it is).

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Here is an example of LOG running Sort on a data set already sorted:

62 proc sort data=sashelp.class; 63 by name; 64 run; NOTE: Input data set is already sorted, no sorting done. NOTE: PROCEDURE SORT used (Total process time): real time 0.00 seconds cpu time 0.00 seconds

Note that if the set had been sorted by a different variable(s) then it would sort it.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks. You wrote -The best sort procedures take O(NlogN) time.

Can you show code ?

I know just the regular proc sort procedure but working on 100 million rows might be long time?

Can you show code ?

I know just the regular proc sort procedure but working on 100 million rows might be long time?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@Ronein wrote:

Thanks. You wrote -The best sort procedures take O(NlogN) time.

Can you show code ?

I know just the regular proc sort procedure but working on 100 million rows might be long time?

Huh? PROC SORT is the code to use to sort a dataset. It has already been optimized by SAS. There are option you can use too optimize it if you know something about the data. For example if you have lot of variables (or really long character variables) and just a few short BY variables then using TAGSORT option might improve performance by not having to copy all of that data as many times. But if you only have a few variables then TAGSORT will probably actually take longer since it will have to do more work.

Are you ready for the spotlight? We're accepting content ideas for **SAS Innovate 2025** to be held May 6-9 in Orlando, FL. The call is **open **until September 25. Read more here about **why** you should contribute and **what is in it** for you!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.