DATA Step, Macro, Functions and more

BY variables are not properly sorted error with underscore in the data

Reply
Contributor
Posts: 20

BY variables are not properly sorted error with underscore in the data

[ Edited ]

Hi all

 

I am getting "BY variables are not properly" error in data step because the data has a underscore. Can someone help me with a workaround for this?

 

data test;
input area_code $ flag $;
datalines;
test 1
transaction 1
txnfdh 1
txn_test 1
;
run;

 

data _null_;
set test;
by area_code;
run;

 

It is working if I have notsorted option in by statement, but I need this in a sorted form. I have a call execute followed by 'by' in the real code and I need the macros to run in a controlled order.

 

It makes no difference to have 'test' data set proc sorted or not. The same error comes even if the data is proc sorted. The error exactly shows up before the txn_test record.

 

ERROR: BY variables are not properly sorted on data set WORK.TEST.
area_code=txnfdh flag=1 FIRST.area_code=1 LAST.area_code=1 _ERROR_=1 _N_=3

 

Edit:

 

This error occurs in SAS Greenplum to be specific.

 

data gpdb1.tests;
input area_code $ flag $;
datalines;
test 1
transaction 1
txnfdh 1
txn_iadbo 1
;
run;

 

proc sort data=gpdb1.tests out=tests;
by area_code;
run;

 

data _null_;
set tests;
by area_code;
run;

 

Here gpdb1 is a greenplum database. I am accessing it using libname statement. 

 

Thanks

 

I have attached log.

 

Super User
Posts: 17,776

Re: BY variables are not properly sorted error with underscore in the data

Can you post the full log including your proc sort. That's the solution - sort prior to the data step. 

Super User
Posts: 10,476

Re: BY variables are not properly sorted error with underscore in the data

If you want the data to behave in the existing order then why are you using BY in data _null_? 

 

And the notsorted on the BY statement only says to use the data in the order it comes in, it will NOT change the order of anything in the data set.

Contributor
Posts: 20

Re: BY variables are not properly sorted error with underscore in the data

[ Edited ]

Hi

 

Thanks for your response.

 

I didn't have proc sort before and there were no issues having BY in data _null_ statement without a proc sort. It ran into issue when it had this particular type of data.

 

May be I need to add proc sort before data null and then use it with notsorted option?

 

Thanks

Super User
Posts: 5,079

Re: BY variables are not properly sorted error with underscore in the data

If this is your actual code, you should note that you are never getting "transaction" as the value of AREA_CODE.

 

Reading between the lines, it looks like the values of AREA_CODE need to be in that order.   If you sort them, it will change the order and the result will be wrong.

 

Since you don't show your later code, it might be as simple as removing the BY statement from the DATA step.  It doesn't seem likely that you actually need to utilize FIRST.AREA_CODE or LAST.AREA_CODE in  your programming logic.  Of course I could be wrong about that since we haven't seen the programming logic.

 

Finally, note that adding NOTSORTED has no impact on the order of the observations.  It just allows creation of FIRST.AREA_CODE and LAST.AREA_CODE even though the AREA_CODE values are not sorted.

Contributor
Posts: 20

Re: BY variables are not properly sorted error with underscore in the data

Datalines step is not the actual one.. it's just indicative. You can ignore the transaction truncation.

 

Test dataset is not sorted before that.. and needs to be used in a sorted form within data step. As I mentioned earlier, I have a call execute statement after by with area_code as a parameter. So the macros need to be executed in the alphabetical order for certain functionality. I guess removing by would execute it random order.

Super User
Posts: 17,776

Re: BY variables are not properly sorted error with underscore in the data

I would contact SAS tech support. Based on your code it should work. 

I have no errors when executing your sample. 

Contributor
Posts: 20

Re: BY variables are not properly sorted error with underscore in the data

Did the first code without PROC SORT work without any issues at your end?

Super User
Posts: 5,079

Re: BY variables are not properly sorted error with underscore in the data

Something doesn't add up then.  When you run PROC SORT, does the log indicate that PROC SORT actually ran (and it wasn't skipped because the data set was created with the SORTEDBY= option)? 

 

There are options that affect which character-sorting-sequence PROC SORT should use.  When PROC SORT finishes, are the observations in the proper order?  If the observations are in order, you can remove the BY statement from the DATA step.  The BY statement doesn't change the order of the observations in any way.  It merely creates the FIRST. and LAST. variables that you can utilize in your subsequent programming statements.

Super User
Posts: 10,476

Re: BY variables are not properly sorted error with underscore in the data


SASdn wrote:

Datalines step is not the actual one.. it's just indicative. You can ignore the transaction truncation.

 

Test dataset is not sorted before that.. and needs to be used in a sorted form within data step. As I mentioned earlier, I have a call execute statement after by with area_code as a parameter. So the macros need to be executed in the alphabetical order for certain functionality. I guess removing by would execute it random order.


Show the order the example data needs to be processed.

Likely the issue is an unneeded "by statement" in the data _null_.

 

Removing the BY statement will run them in the order of appeareance in the data set. If you need to to so something based on FIRST. or LAST. processing then NOTSORTED will not change the order or make it a random order it just makes the first and last operations available.

 

If you need to feed data into a data step in a specific order and your original data is not in that order then you will need to provide some rule for getting things into that order. But we can't help with that if we don't know what the order may be. Possibly you create a data set with the value of your "area_code" or whatever variable and an order value. Then combine that set with your raw data to get the order variable attached to each record. Then sort by the order variable. If you use Proc Sql you can do the combine and sort in the same step. I don't know if the substitute tests set I create below is valid for your project but demonstrates the reorder to group by the specified order

data work.tests;
input area_code $ flag $;
datalines;
test 1
transaction 1
txnfdh 1
txn_iadbo 1
transaction 3
txnfdh 2
;
run;

data work.order;
input area_code $ order;
datalines;
test 1
transaction 2
txnfdh 3
txn_iadbo 4
;
run;

proc sql;
   create table work.ordered as
   select b.order, a.*
   from work.tests as a left join work.order as b
        on a.area_code=b.area_code
   order by b.order;
quit;
Ask a Question
Discussion stats
  • 9 replies
  • 393 views
  • 0 likes
  • 4 in conversation