Solved: ERROR: BY variables are not properly sorted on data set WORK.H_A_TRANS...

bhca60 · Posted 07-10-2024 12:24 PM

I keep getting the following error:

ERROR: BY variables are not properly sorted on data set WORK.H_A_TRANS_COMM2.

I pull my data then do a sort by the key and a couple other variables then try to pick the first input_dttm. The key can have multiple input_dttm values so maybe this is why it's happening. How do I pick the latest input_dttm for each key? Any help is greatly appreciated. Here is my log:


 
83    data H_A_CASE (keep=key seq_id create_dttm admit_dt dschg_dt ntfy_dttm status_cd);
84    set rda.H_A_CASE_V;
85    where (datepart(create_dttm) between &start. and &end.) and status_cd not in ('5' '6') ;
SYMBOLGEN:  Macro variable START resolves to '01JUL2023'd
SYMBOLGEN:  Macro variable END resolves to '06JUL2024'd
86    run;
 

NOTE: There were 19264049 observations read from the data set rda.H_A_CASE_V.
      WHERE (DATEPART(create_dttm)>='01JUL2023'D and DATEPART(create_dttm)<='06JUL2024'D) and status_cd not in ('5', '6');
NOTE: The data set WORK.H_A_CASE has 19264049 observations and 7 variables.
NOTE: Compressing data set WORK.H_A_CASE increased size by 10.95 percent. 
      Compressed is 23540 pages; un-compressed would require 21216 pages.

      Block Output Operations           0
      
87    
88    
89    data H_A_TRANS_COMM (keep=key trans_id input_dttm);
90    set rda.H_A_TRANS_COMM_V ;
91    where (datepart(input_dttm) between &start. and &end.) ;
SYMBOLGEN:  Macro variable START resolves to '01JUL2023'd
SYMBOLGEN:  Macro variable END resolves to '06JUL2024'd
92    run;
 
 
NOTE: There were 152633956 observations read from the data set RDA.H_A_TRANS_COMM_V.
      WHERE (DATEPART(input_dttm)>='01JUL2023'D and DATEPART(input_dttm)<='06JUL2024'D);
NOTE: The data set WORK.H_A_TRANS_COMM has 152633956 observations and 3 variables.
NOTE: Compressing data set WORK.H_A_TRANS_COMM increased size by 31.91 percent. 
      Compressed is 123366 pages; un-compressed would require 93526 pages.

93    
94    
95    proc sort data=H_A_TRANS_COMM out=H_A_TRANS_COMM2;
96     by key trans_id descending input_dttm;
97    run;
NOTE: There were 152633956 observations read from the data set WORK.H_A_TRANS_COMM.
NOTE: SAS threaded sort was used.
NOTE: The data set WORK.H_A_TRANS_COMM2 has 152633956 observations and 3 variables.
NOTE: Compressing data set WORK.H_A_TRANS_COMM2 increased size by 31.91 percent. 
      Compressed is 123366 pages; un-compressed would require 93526 pages.
NOTE: PROCEDURE SORT used (Total process time):

98    
99    /*want max input_dttm*/
100   data out.trans_comm ;
101   set H_A_TRANS_COMM2 ;
102   by key trans_id input_dttm;
103   if first.input_dttm;
104  
105   run;
ERROR: BY variables are not properly sorted on data set WORK.H_A_TRANS_COMM2.
KEY=H-123456789 INPUT_DTTM=04OCT2023:23:42:53 TRANS_ID=. FIRST.KEY=0 LAST.KEY=1
FIRST.INPUT_DTTM=1 LAST.INPUT_DTTM=1 _ERROR_=1 _N_=4

PaigeMiller · Posted 07-10-2024 12:41 PM

These two BY statements are not the same

by key trans_id descending input_dttm; /* As seen in PROC SORT */
by key trans_id input_dttm; /* As seen in the DATA step */

--
Paige Miller

View solution in original post

PaigeMiller · Posted 07-10-2024 12:41 PM

These two BY statements are not the same

by key trans_id descending input_dttm; /* As seen in PROC SORT */
by key trans_id input_dttm; /* As seen in the DATA step */

--
Paige Miller

bhca60 · Posted 07-10-2024 12:54 PM

Thank you. How would I pick the first input_dttm for each key? Because I still get multiple input_dttm values listed for the same key values when I just need the latest for each key.

123456789 03JUL2023 12:00:33

123456789 06JUL2023 11:09:45

Thank you

PaigeMiller · Posted 07-10-2024 01:03 PM

@bhca60 wrote:

Thank you. How would I pick the first input_dttm for each key? Because I still get multiple input_dttm values listed for the same key values when I just need the latest for each key.

123456789 03JUL2023 12:00:33

123456789 06JUL2023 11:09:45

Thank you

What happened to TRANS_ID? Can you provide more details about what you are asking for?

--
Paige Miller

bhca60 · Posted 07-10-2024 03:21 PM

123456789 03JUL2023 12:00:33 9806666

123456789 06JUL2023 11:09:45 8909383

The trans ids are different

Tom · Posted 07-10-2024 03:43 PM

@bhca60 wrote:

123456789 03JUL2023 12:00:33 9806666

123456789 06JUL2023 11:09:45 8909383

The trans ids are different

And so WHAT does that mean in terms of your actual problem?

Do you want the last observation per KEY value? If so do you want the values sorted only be the datetime value? Or do you want them sorted by the TRANS_ID value? Or both? If both which takes precedence in the ordering?

Let's make a little dataset where it makes a difference which variables you sort by.

data have;
  input key $ input_dttm :datetime. trans_id $ ;
  format input_dttm datetime19.;
cards;
A 03JUL2023:12:00:33 ZZZZ
A 06JUL2023:11:09:45 AAAA
;

Now let's try it both ways and see how the results differ.

proc sort data=have;
  by key input_dttm trans_id;
run;

data test1;
  set have;
  by key input_dttm trans_id;
  if last.key;
run;

proc sort data=have;
  by key trans_id input_dttm;
run;

data test2;
  set have;
  by key trans_id input_dttm;
  if last.key;
run;

proc compare data=test1 compare=test2;
run;

So which observation do you want to select??

PaigeMiller · Posted 07-10-2024 04:46 PM

@bhca60 wrote:

123456789 03JUL2023 12:00:33 9806666

123456789 06JUL2023 11:09:45 8909383

The trans ids are different

@bhca60 I specifically asked in my previous post "Can you provide more details about what you are asking for?" but you didn't answer that.

So where does that leave us? You asked a question about sorting which was answered. Then you went ahead and gave us a different problem with the same (or similar) data, that has nothing to do with your sorting problem, and you gave us a minimal description of this different problem. It's essentially a new problem, please take some time to describe the problem fully and describe in detail what you want.

--
Paige Miller

Tom · Posted 07-10-2024 01:52 PM

Not clear what you are asking for but the code you posted was NOT picking the first datetime value. It was making sure the datetime values where unique.

If you want the maximum datetime value, within some other set of key variables, then do something like:

proc sort data=have;
  by a b datetime_var;
run;
data want;
  set have;
  by a b datetime_var;
  if last.b;
run;

So you get one observation per A*B combination and it has the maximum value of DATETIME_VAR.

bhca60 · Posted 07-10-2024 08:56 PM

Yes, I am trying to get the latest dttm for each key and remove the dttm values that are earlier.
HAVE:
key trans_id input_dttm
123456789 435986 03JUL2024: 12:13:30
123456789 567943 06JUL2024:11:14:58
WANT:
key trans_id input_dttm
123456789 567943 06JUL2024:11:14:58

ERROR: BY variables are not properly sorted on data set WORK.H_A_TRANS_COMM2.

Re: ERROR: BY variables are not properly sorted on data set WORK.H_A_TRANS_COMM2.

Re: ERROR: BY variables are not properly sorted on data set WORK.H_A_TRANS_COMM2.

Re: ERROR: BY variables are not properly sorted on data set WORK.H_A_TRANS_COMM2.

Re: ERROR: BY variables are not properly sorted on data set WORK.H_A_TRANS_COMM2.

Re: ERROR: BY variables are not properly sorted on data set WORK.H_A_TRANS_COMM2.

Re: ERROR: BY variables are not properly sorted on data set WORK.H_A_TRANS_COMM2.

Re: ERROR: BY variables are not properly sorted on data set WORK.H_A_TRANS_COMM2.

Re: ERROR: BY variables are not properly sorted on data set WORK.H_A_TRANS_COMM2.

Re: ERROR: BY variables are not properly sorted on data set WORK.H_A_TRANS_COMM2.

The 2025 SAS Hackathon has begun!

SAS Training: Just a Click Away