10-19-2012 02:26 PM

**Hi SAS Community,**

**Could any one help me to understand this.**

**1). This is my data set**

**data** have;

informat current_date date9.;

input Bank_number Account_number $ Current_date Arrears_Band & $;

format current_date date9.;

cards;

10 111 31OCT2010 Current

10 111 31MAR2010 NPNA

70 111 30NOV2010 Current

10 111 28FEB2010 90 +

70 111 31DEC2010 60 - 90

;

**run**;

**2). I sorted it using all four variables.**

**proc** **sort** data=have out=want;

by Bank_number Account_number Current_date Arrears_Band;

**run**;

**proc** **print**; **run**;

3). This is the output

/*

current_date Bank_number Account_number Arrears_Band

28FEB2010 10 111 90 +

31MAR2010 10 111 NPNA

31OCT2010 10 111 Current

30NOV2010 70 111 Current

31DEC2010 70 111 60 - 90

*/

4). Now I sorted the same data set using only first 3 variables

**proc** **sort** data=have out=want;

by Bank_number Account_number Current_date ;

**run**;

**proc** **print**; **run**;

5). This is the output

/*

current_date Bank_number Account_number Arrears_Band

28FEB2010 10 111 90 +

31MAR2010 10 111 NPNA

31OCT2010 10 111 Current

30NOV2010 70 111 Current

31DEC2010 70 111 60 - 90

*/

6). Question.

Sorting by all 4 variables or only by first 3 variables gives the same output.

The reason is that the order of the 4th variable in the sorted data set is dictated by the other variables being sorted.

By this trial and error I have now decided no point of having the 4th variable as a sorting variable.

But how could one decide this without this trail and error method?

Thanks

M

10-19-2012
03:30 PM

Posted in reply to Mirisage

10-19-2012 03:30 PM

10-19-2012
03:30 PM

Posted in reply to Mirisage

10-19-2012 03:30 PM

Posted in reply to Mirisage

10-19-2012 04:26 PM

Sorting is totally depends on your requirement and the way you want to present the data . There is no point in sorting the dataset with all the variables in the dataset.

Suppose If dataset is having n variables and if you sort with all the variables in the dataset is always equal to using n-1 variable without changing the order of the variables in the by clause.

Posted in reply to AkilanR

10-19-2012 08:04 PM

Have to disagree with AkilanR: only true if there are not instances of multiple rows per combination of n-1 key variables.

You might want to sort all variables if you are subsequently going to process with a data step using First.*key* and Last.*key* methods.

Consider the table

id | account | customer |

ABC | 123 | JIM |

ABC | 123 | ALICE |

Sorting by all three variables is going to reverse the order, while sorting by only id and account may not.