BookmarkSubscribeRSS Feed
Ronein
Onyx | Level 15

Hello

I want to learn the difference between equals and noequals in proc sort+nodupkey.

I have some questions:

1-Is it only relevant for using in Proc sort with nodupkey or also using proc sort without nodupkey?

2- Why in the following 2 examples there are same results.

As I understand using Equals SAS remain the same order of obserbations  as in source dataset (for each Sex) 

and then will take the first observations (because of nodupkey option).

What about  noequals? What is the crtieria of sorting the data here?

 

proc sort data=sashelp.class out=emp1 nodupkey equals;

by sex;

run;

 

proc sort data=sashelp.class out=emp2 nodupkey noequals;

by sex;

run;

 

3- What is difference between:

proc sort data=sashelp.class out=emp1 nodupkey equals;

by sex;

run;

 

and 

 

proc sort data=sashelp.class out=ttt;

by sex;

run;

 

Data wanted ;

set ttt;

by sex;

if first.sex;

Run;

 

 

 

 

4 REPLIES 4
andreas_lds
Jade | Level 19

Have you read the documentation of proc sort?

EQUALS | NOEQUALS

specifies the order of the observations in the output data set. For observations with identical BY-variable values, EQUALS maintains the relative order of the observations within the input data set in the output data set. NOEQUALS does not necessarily preserve this order in the output data set.

 

For more details see https://documentation.sas.com/doc/de/pgmsascdc/9.4_3.5/proc/p02bhn81rn4u64n1b6l00ftdnxge.htm#p017kkt...

Astounding
PROC Star
Create a SAS data set with 10 observations and 3 variables. Then test your questions.
ballardw
Super User

One approach to comparing result data sets is to use PROC COMPARE with the two data sets.

mkeintz
PROC Star

If you are sorting a large data set in which there are many ties in the BY variables, using the EQUALS option to tell SAS to maintain the original relative order of records with tied BYs puts an extra burden on PROC SORT.  It can't just compare by values - it also has to have a (temporary) record number fo each observations.  So the proc sort with EQUALS can take a little extra time to run, due to this extra requirement.

 

BTW, EQUALS is our default, and I believe the default for most SAS installations.

 

Also, often using the NOEQUALS option will produce exactly the same sequence of records as EQUALS, even when there are lots of ties.  In fact, I don't recall ever seeing a difference when I inadvertently used NOEQUALS, and subsequently checked results against EQUALS.

 

And I almost always want the EQUALS option.  In particular, we have tens of millions of trades and quotes from public stocks exchanges per day.  They all have time stamps, but often a given stock will experience multiple records with the same time stamp (i.e. tied BY value).  However, the physical order of the records provided by the data vendor represented the actual chronological order of the trades/quotes (time stamps have moved over the decades from whole seconds to nano-seconds).  To preserve that order (important to market microanalysis) we would always want EQUALS.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 2714 views
  • 0 likes
  • 5 in conversation