BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
bhca60
Quartz | Level 8

I keep getting the following error:

ERROR: BY variables are not properly sorted on data set WORK.H_A_TRANS_COMM2.
 
I pull my data then do a sort by the key and a couple other variables then try to pick the first input_dttm. The key can have multiple input_dttm values so maybe this is why it's happening.  How do I pick the latest input_dttm for each key? Any help is greatly appreciated.  Here is my log:
 


 
83    data H_A_CASE (keep=key seq_id create_dttm admit_dt dschg_dt ntfy_dttm status_cd);
84    set rda.H_A_CASE_V;
85    where (datepart(create_dttm) between &start. and &end.) and status_cd not in ('5' '6') ;
SYMBOLGEN:  Macro variable START resolves to '01JUL2023'd
SYMBOLGEN:  Macro variable END resolves to '06JUL2024'd
86    run;
 

NOTE: There were 19264049 observations read from the data set rda.H_A_CASE_V.
      WHERE (DATEPART(create_dttm)>='01JUL2023'D and DATEPART(create_dttm)<='06JUL2024'D) and status_cd not in ('5', '6');
NOTE: The data set WORK.H_A_CASE has 19264049 observations and 7 variables.
NOTE: Compressing data set WORK.H_A_CASE increased size by 10.95 percent. 
      Compressed is 23540 pages; un-compressed would require 21216 pages.

      Block Output Operations           0
      
87    
88    
89    data H_A_TRANS_COMM (keep=key trans_id input_dttm);
90    set rda.H_A_TRANS_COMM_V ;
91    where (datepart(input_dttm) between &start. and &end.) ;
SYMBOLGEN:  Macro variable START resolves to '01JUL2023'd
SYMBOLGEN:  Macro variable END resolves to '06JUL2024'd
92    run;
 
 
NOTE: There were 152633956 observations read from the data set RDA.H_A_TRANS_COMM_V.
      WHERE (DATEPART(input_dttm)>='01JUL2023'D and DATEPART(input_dttm)<='06JUL2024'D);
NOTE: The data set WORK.H_A_TRANS_COMM has 152633956 observations and 3 variables.
NOTE: Compressing data set WORK.H_A_TRANS_COMM increased size by 31.91 percent. 
      Compressed is 123366 pages; un-compressed would require 93526 pages.

93    
94    
95    proc sort data=H_A_TRANS_COMM out=H_A_TRANS_COMM2;
96     by key trans_id descending input_dttm;
97    run;
NOTE: There were 152633956 observations read from the data set WORK.H_A_TRANS_COMM.
NOTE: SAS threaded sort was used.
NOTE: The data set WORK.H_A_TRANS_COMM2 has 152633956 observations and 3 variables.
NOTE: Compressing data set WORK.H_A_TRANS_COMM2 increased size by 31.91 percent. 
      Compressed is 123366 pages; un-compressed would require 93526 pages.
NOTE: PROCEDURE SORT used (Total process time):

98    
99    /*want max input_dttm*/
100   data out.trans_comm ;
101   set H_A_TRANS_COMM2 ;
102   by key trans_id input_dttm;
103   if first.input_dttm;
104  
105   run;
ERROR: BY variables are not properly sorted on data set WORK.H_A_TRANS_COMM2.
KEY=H-123456789 INPUT_DTTM=04OCT2023:23:42:53 TRANS_ID=. FIRST.KEY=0 LAST.KEY=1
FIRST.INPUT_DTTM=1 LAST.INPUT_DTTM=1 _ERROR_=1 _N_=4

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

These two BY statements are not the same

 

by key trans_id descending input_dttm; /* As seen in PROC SORT */
by key trans_id input_dttm; /* As seen in the DATA step */

 

--
Paige Miller

View solution in original post

8 REPLIES 8
PaigeMiller
Diamond | Level 26

These two BY statements are not the same

 

by key trans_id descending input_dttm; /* As seen in PROC SORT */
by key trans_id input_dttm; /* As seen in the DATA step */

 

--
Paige Miller
bhca60
Quartz | Level 8

Thank you. How would I pick the first input_dttm for each key?  Because I still get multiple input_dttm values listed for the same key values when I just need the latest for each key.

123456789    03JUL2023 12:00:33

123456789    06JUL2023 11:09:45

 

Thank you 

PaigeMiller
Diamond | Level 26

@bhca60 wrote:

Thank you. How would I pick the first input_dttm for each key?  Because I still get multiple input_dttm values listed for the same key values when I just need the latest for each key.

123456789    03JUL2023 12:00:33

123456789    06JUL2023 11:09:45

 

Thank you 


What happened to TRANS_ID? Can you provide more details about what you are asking for?

--
Paige Miller
bhca60
Quartz | Level 8

123456789    03JUL2023 12:00:33  9806666

123456789    06JUL2023 11:09:45   8909383

 

The trans ids are different

Tom
Super User Tom
Super User

@bhca60 wrote:

123456789    03JUL2023 12:00:33  9806666

123456789    06JUL2023 11:09:45   8909383

 

The trans ids are different


And so  WHAT does that mean in terms of your actual problem?

Do you want the last observation per KEY value?  If so do you want the values sorted only be the datetime value?  Or do you want them sorted by the TRANS_ID value?  Or both? If both which takes precedence in the ordering?

 

Let's make a little dataset where it makes a difference which variables you sort by. 

data have;
  input key $ input_dttm :datetime. trans_id $ ;
  format input_dttm datetime19.;
cards;
A 03JUL2023:12:00:33 ZZZZ
A 06JUL2023:11:09:45 AAAA
;

Now let's try it both ways and see how the results differ.

proc sort data=have;
  by key input_dttm trans_id;
run;

data test1;
  set have;
  by key input_dttm trans_id;
  if last.key;
run;

proc sort data=have;
  by key trans_id input_dttm;
run;

data test2;
  set have;
  by key trans_id input_dttm;
  if last.key;
run;

proc compare data=test1 compare=test2;
run;

Tom_0-1720640568987.png

 

So which observation do you want to select??

PaigeMiller
Diamond | Level 26

@bhca60 wrote:

123456789    03JUL2023 12:00:33  9806666

123456789    06JUL2023 11:09:45   8909383

 

The trans ids are different


@bhca60  I specifically asked in my previous post "Can you provide more details about what you are asking for?" but you didn't answer that.

 

So where does that leave us? You asked a question about sorting which was answered. Then you went ahead and gave us a different problem with the same (or similar) data, that has nothing to do with your sorting problem, and you gave us a minimal description of this different problem. It's essentially a new problem, please take some time to describe the problem fully and describe in detail what you want.

--
Paige Miller
Tom
Super User Tom
Super User

Not clear what you are asking for but the code you posted was NOT picking the first datetime value.  It was making sure the datetime values where unique.

 

If you want the maximum datetime value, within some other set of key variables, then do something like:

proc sort data=have;
  by a b datetime_var;
run;
data want;
  set have;
  by a b datetime_var;
  if last.b;
run;

So you get one observation per A*B combination and it has the maximum value of DATETIME_VAR.

bhca60
Quartz | Level 8
Yes, I am trying to get the latest dttm for each key and remove the dttm values that are earlier.
HAVE:
key trans_id input_dttm
123456789 435986 03JUL2024: 12:13:30
123456789 567943 06JUL2024:11:14:58
WANT:
key trans_id input_dttm
123456789 567943 06JUL2024:11:14:58

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Creating Custom Steps in SAS Studio

Check out this tutorial series to learn how to build your own steps in SAS Studio.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2574 views
  • 2 likes
  • 3 in conversation