Help using Base SAS procedures

What is the point of having 4th sorting variable here?

Accepted Solution Solved
Reply
Super Contributor
Posts: 338
Accepted Solution

What is the point of having 4th sorting variable here?

Hi SAS Community,

Could any one help me to understand this.

1). This is my data set

data have;

informat current_date date9.;

input Bank_number Account_number $  Current_date Arrears_Band & $;

format current_date date9.;

cards;

10 111 31OCT2010 Current

10 111 31MAR2010 NPNA

70 111 30NOV2010 Current

10 111 28FEB2010 90 +

70 111 31DEC2010 60 - 90

;

run;

2). I sorted it using all four variables.

proc sort data=have out=want;

   by Bank_number Account_number   Current_date Arrears_Band;

run;

proc print; run;

3). This is the output

/*

current_date    Bank_number     Account_number  Arrears_Band

28FEB2010  10   111  90 +

31MAR2010  10   111  NPNA

31OCT2010  10   111  Current

30NOV2010  70   111  Current

31DEC2010  70   111  60 - 90

*/

4). Now I sorted the same data set using only first 3 variables

proc sort data=have out=want;

   by Bank_number Account_number   Current_date ;

run;

proc print; run;

5). This is the output

/*

current_date    Bank_number     Account_number  Arrears_Band

28FEB2010  10   111  90 +

31MAR2010  10   111  NPNA

31OCT2010  10   111  Current

30NOV2010  70   111  Current

31DEC2010  70   111  60 - 90

*/

6). Question.

Sorting by all 4 variables or only by first 3 variables gives the same output.

The reason is that the order of the 4th variable in the sorted data set is dictated by the other variables being sorted.

By this trial and error I have now decided no point of having the 4th variable as a sorting variable.

But how could one decide this without this trail and error method?

Thanks

M


Accepted Solutions
Solution
‎10-19-2012 03:30 PM
Trusted Advisor
Posts: 2,116

Re: What is the point of having 4th sorting variable here?

With what you have told us, you can't.  You need some external information about the data.  You need to know, from knowledge of the business need, that the bank+account+date uniquely identify the row.

View solution in original post


All Replies
Solution
‎10-19-2012 03:30 PM
Trusted Advisor
Posts: 2,116

Re: What is the point of having 4th sorting variable here?

With what you have told us, you can't.  You need some external information about the data.  You need to know, from knowledge of the business need, that the bank+account+date uniquely identify the row.

Occasional Contributor
Posts: 17

Re: What is the point of having 4th sorting variable here?

Sorting is totally depends on your requirement and the way you want to present the data . There is no point in sorting the dataset with all the variables in the dataset.

Suppose If dataset is having n variables and if you sort with all the variables in the dataset is always equal to using n-1 variable without changing the order of the variables in the by clause.

Super Contributor
Posts: 644

Re: What is the point of having 4th sorting variable here?

Have to disagree with AkilanR: only true if there are not instances of multiple rows per combination of n-1 key variables.

You might want to sort all variables if you are subsequently going to process with a data step using First.key and Last.key methods.

Consider the table

id

account

customer

ABC

123

JIM

ABC

123

ALICE

Sorting by all three variables is going to reverse the order, while sorting by only id and account may not.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 155 views
  • 8 likes
  • 4 in conversation