BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Mirisage
Obsidian | Level 7

Hi SAS Community,

Could any one help me to understand this.

1). This is my data set

data have;

informat current_date date9.;

input Bank_number Account_number $  Current_date Arrears_Band & $;

format current_date date9.;

cards;

10 111 31OCT2010 Current

10 111 31MAR2010 NPNA

70 111 30NOV2010 Current

10 111 28FEB2010 90 +

70 111 31DEC2010 60 - 90

;

run;

2). I sorted it using all four variables.

proc sort data=have out=want;

   by Bank_number Account_number   Current_date Arrears_Band;

run;

proc print; run;

3). This is the output

/*

current_date    Bank_number     Account_number  Arrears_Band

28FEB2010  10   111  90 +

31MAR2010  10   111  NPNA

31OCT2010  10   111  Current

30NOV2010  70   111  Current

31DEC2010  70   111  60 - 90

*/

4). Now I sorted the same data set using only first 3 variables

proc sort data=have out=want;

   by Bank_number Account_number   Current_date ;

run;

proc print; run;

5). This is the output

/*

current_date    Bank_number     Account_number  Arrears_Band

28FEB2010  10   111  90 +

31MAR2010  10   111  NPNA

31OCT2010  10   111  Current

30NOV2010  70   111  Current

31DEC2010  70   111  60 - 90

*/

6). Question.

Sorting by all 4 variables or only by first 3 variables gives the same output.

The reason is that the order of the 4th variable in the sorted data set is dictated by the other variables being sorted.

By this trial and error I have now decided no point of having the 4th variable as a sorting variable.

But how could one decide this without this trail and error method?

Thanks

M

1 ACCEPTED SOLUTION

Accepted Solutions
Doc_Duke
Rhodochrosite | Level 12

With what you have told us, you can't.  You need some external information about the data.  You need to know, from knowledge of the business need, that the bank+account+date uniquely identify the row.

View solution in original post

3 REPLIES 3
Doc_Duke
Rhodochrosite | Level 12

With what you have told us, you can't.  You need some external information about the data.  You need to know, from knowledge of the business need, that the bank+account+date uniquely identify the row.

AkilanR
Fluorite | Level 6

Sorting is totally depends on your requirement and the way you want to present the data . There is no point in sorting the dataset with all the variables in the dataset.

Suppose If dataset is having n variables and if you sort with all the variables in the dataset is always equal to using n-1 variable without changing the order of the variables in the by clause.

RichardinOz
Quartz | Level 8

Have to disagree with AkilanR: only true if there are not instances of multiple rows per combination of n-1 key variables.

You might want to sort all variables if you are subsequently going to process with a data step using First.key and Last.key methods.

Consider the table

id

account

customer

ABC

123

JIM

ABC

123

ALICE

Sorting by all three variables is going to reverse the order, while sorting by only id and account may not.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 686 views
  • 8 likes
  • 4 in conversation