Hi SAS Community,
Could any one help me to understand this.
1). This is my data set
data have;
informat current_date date9.;
input Bank_number Account_number $ Current_date Arrears_Band & $;
format current_date date9.;
cards;
10 111 31OCT2010 Current
10 111 31MAR2010 NPNA
70 111 30NOV2010 Current
10 111 28FEB2010 90 +
70 111 31DEC2010 60 - 90
;
run;
2). I sorted it using all four variables.
proc sort data=have out=want;
by Bank_number Account_number Current_date Arrears_Band;
run;
proc print; run;
3). This is the output
/*
current_date Bank_number Account_number Arrears_Band
28FEB2010 10 111 90 +
31MAR2010 10 111 NPNA
31OCT2010 10 111 Current
30NOV2010 70 111 Current
31DEC2010 70 111 60 - 90
*/
4). Now I sorted the same data set using only first 3 variables
proc sort data=have out=want;
by Bank_number Account_number Current_date ;
run;
proc print; run;
5). This is the output
/*
current_date Bank_number Account_number Arrears_Band
28FEB2010 10 111 90 +
31MAR2010 10 111 NPNA
31OCT2010 10 111 Current
30NOV2010 70 111 Current
31DEC2010 70 111 60 - 90
*/
6). Question.
Sorting by all 4 variables or only by first 3 variables gives the same output.
The reason is that the order of the 4th variable in the sorted data set is dictated by the other variables being sorted.
By this trial and error I have now decided no point of having the 4th variable as a sorting variable.
But how could one decide this without this trail and error method?
Thanks
M
With what you have told us, you can't. You need some external information about the data. You need to know, from knowledge of the business need, that the bank+account+date uniquely identify the row.
With what you have told us, you can't. You need some external information about the data. You need to know, from knowledge of the business need, that the bank+account+date uniquely identify the row.
Sorting is totally depends on your requirement and the way you want to present the data . There is no point in sorting the dataset with all the variables in the dataset.
Suppose If dataset is having n variables and if you sort with all the variables in the dataset is always equal to using n-1 variable without changing the order of the variables in the by clause.
Have to disagree with AkilanR: only true if there are not instances of multiple rows per combination of n-1 key variables.
You might want to sort all variables if you are subsequently going to process with a data step using First.key and Last.key methods.
Consider the table
id | account | customer |
ABC | 123 | JIM |
ABC | 123 | ALICE |
Sorting by all three variables is going to reverse the order, while sorting by only id and account may not.
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.