turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- What is the point of having 4th sorting variable h...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-19-2012 02:26 PM

**Hi SAS Community,**

**Could any one help me to understand this.**

**1). This is my data set**

**data** have;

informat current_date date9.;

input Bank_number Account_number $ Current_date Arrears_Band & $;

format current_date date9.;

cards;

10 111 31OCT2010 Current

10 111 31MAR2010 NPNA

70 111 30NOV2010 Current

10 111 28FEB2010 90 +

70 111 31DEC2010 60 - 90

;

**run**;

**2). I sorted it using all four variables.**

**proc** **sort** data=have out=want;

by Bank_number Account_number Current_date Arrears_Band;

**run**;

**proc** **print**; **run**;

3). This is the output

/*

current_date Bank_number Account_number Arrears_Band

28FEB2010 10 111 90 +

31MAR2010 10 111 NPNA

31OCT2010 10 111 Current

30NOV2010 70 111 Current

31DEC2010 70 111 60 - 90

*/

4). Now I sorted the same data set using only first 3 variables

**proc** **sort** data=have out=want;

by Bank_number Account_number Current_date ;

**run**;

**proc** **print**; **run**;

5). This is the output

/*

current_date Bank_number Account_number Arrears_Band

28FEB2010 10 111 90 +

31MAR2010 10 111 NPNA

31OCT2010 10 111 Current

30NOV2010 70 111 Current

31DEC2010 70 111 60 - 90

*/

6). Question.

Sorting by all 4 variables or only by first 3 variables gives the same output.

The reason is that the order of the 4th variable in the sorted data set is dictated by the other variables being sorted.

By this trial and error I have now decided no point of having the 4th variable as a sorting variable.

But how could one decide this without this trail and error method?

Thanks

M

Accepted Solutions

Solution

10-19-2012
03:30 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Mirisage

10-19-2012 03:30 PM

All Replies

Solution

10-19-2012
03:30 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Mirisage

10-19-2012 03:30 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Mirisage

10-19-2012 04:26 PM

Sorting is totally depends on your requirement and the way you want to present the data . There is no point in sorting the dataset with all the variables in the dataset.

Suppose If dataset is having n variables and if you sort with all the variables in the dataset is always equal to using n-1 variable without changing the order of the variables in the by clause.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to AkilanR

10-19-2012 08:04 PM

Have to disagree with AkilanR: only true if there are not instances of multiple rows per combination of n-1 key variables.

You might want to sort all variables if you are subsequently going to process with a data step using First.*key* and Last.*key* methods.

Consider the table

id | account | customer |

ABC | 123 | JIM |

ABC | 123 | ALICE |

Sorting by all three variables is going to reverse the order, while sorting by only id and account may not.