Solved: Re: BY variables are not properly sorted

Guerraje · Posted 12-06-2021 05:39 PM

Hello,

I am trying to merge 4 datasets on the variable county. When trying to do this I reviece the error:

WARNING: Multiple lengths were specified for the BY variable county by input data sets. This might cause unexpected results.

ERROR: BY variables are not properly sorted on data set WORK.NAMES.

I am not sure what this means, or how to fix the code from the dataset work.names so that I can complete my merge. The code for the work.names dataset is:

proc sql noprint;
select catx ('=',nliteral(_name_),nliteral(substr(_name_,7)))
into : rename separated by ' '
from names
where _name_ like 'county%'
;
quit;
%put &=sqlobs;
%put &=rename; 
run; 

Data fixedcounty;
SET coimpt.insurance;
rename =rename;
run; 

 


PROC TRANSPOSE DATA = fixedcounty OUT= names;
var county:;
run;

data names;
set names;
county=tranwrd(_name_,"county", "");
 my_new_county = tranwrd(county,"_", " ");
run;


proc print data= names;
run;
 proc sort data= names; BY  descending county;
run;

Kurt_Bremser · Posted 12-06-2021 11:39 PM

Make sure that all datasets have the same length for the BY variable(s). How you do that depends on how the variables are created (read from external file, existing dataset, transpose, ...). Set to the longest of all defined lengths to avoid truncation.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Reeza · Posted 12-06-2021 06:18 PM

You don't include the code that would generate that error. Your code above looks like it has some errors as well, so make sure that everything before your merge has no errors in the log.

Guerraje · Posted 12-06-2021 08:21 PM

There are no errors in the entire log but this one.

Tom · Posted 12-06-2021 08:28 PM

@Guerraje wrote:

There are no errors in the entire log but this one.

The code you posted only has a BY statement in the PROC SORT step. That step would never generate an error message like the one you mentioned.

Post the lines of text from the actual SAS log. Use the Insert Code button (looks like < / > ) to get a pop-up window where you can paste/edit the lines of text copied from the SAS log. This will preserve the formatting of the text. That is important because SAS clearly marks in the SAS log the line of code where it detected an error.

Guerraje · Posted 12-06-2021 08:33 PM

Tom, this is the log from where that error is created.

thank you,

1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
68
69 ****************CoAnlCombined_Analysis **********;
70 Data CoAnlCombined_Analysis;
71 Merge HisLatpercent
72 names
73 languagepercent
74 work.obese
75 ;
76 By County ;
77 run;

WARNING: Multiple lengths were specified for the BY variable county by input data sets. This might cause unexpected results.
ERROR: BY variables are not properly sorted on data set WORK.NAMES.
county=YUMA languageHisLat=. denominator=. HisLatfinal=. _NAME_=countyYUMA _LABEL_=countyYUMA COL1=78.16 my_new_county=YUMA
language_avg=. languagefinal=. estimate= state= risk_pct_obese=. FIRST.county=1 LAST.county=1 _ERROR_=1 _N_=1
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1 observations read from the data set WORK.HISLATPERCENT.
NOTE: There were 2 observations read from the data set WORK.NAMES.
NOTE: There were 1 observations read from the data set WORK.LANGUAGEPERCENT.
NOTE: There were 1 observations read from the data set WORK.OBESE.
WARNING: The data set WORK.COANLCOMBINED_ANALYSIS may be incomplete. When this step was stopped there were 0 observations and 13
variables.
WARNING: Data set WORK.COANLCOMBINED_ANALYSIS was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 2354.59k
OS Memory 28596.00k
Timestamp 12/07/2021 01:31:05 AM
Step Count 206 Switch Count 0
Page Faults 0
Page Reclaims 357
Page Swaps 0
Voluntary Context Switches 0
Involuntary Context Switches 1
Block Input Operations 0
Block Output Operations 8

78
79
80 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
90

Tom · Posted 12-06-2021 09:36 PM

What type of investigations did you do?

 WARNING: Multiple lengths were specified for the BY variable county by input data sets. This might cause unexpected results.

Did you check the datasets to see if COUNTY is defined with the same length in all four of those datasets? You can use PROC CONTENTS to get a report on a dataset. How did you create those four datasets? In each one how did you create the variable COUNTY?

 ERROR: BY variables are not properly sorted on data set WORK.NAMES.
 county=YUMA languageHisLat=. denominator=. HisLatfinal=. _NAME_=countyYUMA _LABEL_=countyYUMA COL1=78.16 my_new_county=YUMA
 language_avg=. languagefinal=. estimate=  state=  risk_pct_obese=. FIRST.county=1 LAST.county=1 _ERROR_=1 _N_=1

Did you try sorting datasets by COUNTY before running this data step? It does seem very strange that the first value of COUNTY is "YUMA".

How does this data step relate to the other code you posted?

What is the overall goal of your program?

What are your inputs?

What is the output you want?

Guerraje · Posted 12-07-2021 10:23 AM

Tom,

This is the last step of a project where I am trying to merge 4 data sets. The whole idea is to manipulate every data set so at the end I can merge all four on county. I am guessing they are not the same lengths. I went through and did a proc sort on each data set to see if that was the problem and it did not resolve the problem. I sorted by descending order.

you mentioned checking to see if the lengths are defined the same way, how would I check and adjust for this? That could certainly be the problem because that is not something I know how to check/adjust for.

Thank you

Reeza · Posted 12-07-2021 10:47 AM

Your BY statement in your SORT should be very the same as in your merge.

Different lengths will generate a warning but if it's just trailing spaces it'll be fine. I'm assuming you've viewed the files and verified no truncation is happening, ie that New York isn't truncated to New York C (8 chars) or something.

You have not posted your full code and log with the issues, showing both the sort, code and log with errors/notes.

Kurt_Bremser · Posted 12-06-2021 11:39 PM

Make sure that all datasets have the same length for the BY variable(s). How you do that depends on how the variables are created (read from external file, existing dataset, transpose, ...). Set to the longest of all defined lengths to avoid truncation.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Guerraje · Posted 12-08-2021 01:01 AM

The length was indeed what I needed to fix. Thank you so much!

ballardw · Posted 12-06-2021 06:21 PM

I suspect you did not actually mean this step:


Data fixedcounty;
SET coimpt.insurance;
rename =rename;
run;

Did you look at the Fixedcounty data set? I am going to guess that the only difference from coimpt.insurance is the addition of a variable with all missing values that is named Rename.

The rename statement does not start with Rename =. It is Rename followed by the rename pairs. Which you seem to have attempted to create in a macrovariable named Rename and then not used.

I think you intended:

Data fixedcounty;
   SET coimpt.insurance;
   rename &rename.;
run;

Which would likely mean that your data set names is completely incorrect prior to the transpose and would not expect the transpose to work.

@Guerraje wrote:

Hello,

I am trying to merge 4 datasets on the variable county. When trying to do this I reviece the error:

WARNING: Multiple lengths were specified for the BY variable county by input data sets. This might cause unexpected results.

ERROR: BY variables are not properly sorted on data set WORK.NAMES.
I am not sure what this means, or how to fix the code from the dataset work.names so that I can complete my merge. The code for the work.names dataset is:
proc sql noprint;
select catx ('=',nliteral(_name_),nliteral(substr(_name_,7)))
into : rename separated by ' '
from names
where _name_ like 'county%'
;
quit;
%put &=sqlobs;
%put &=rename; 
run; 

Data fixedcounty;
SET coimpt.insurance;
rename =rename;
run; 

 


PROC TRANSPOSE DATA = fixedcounty OUT= names;
var county:;
run;

data names;
set names;
county=tranwrd(_name_,"county", "");
 my_new_county = tranwrd(county,"_", " ");
run;


proc print data= names;
run;
 proc sort data= names; BY  descending county;
run; 

Tom · Posted 12-06-2021 06:32 PM

What are you trying to do? The code you posted does not appear to have anything to do with the question you asked.

I am trying to merge 4 datasets on the variable county.

Show example input dataset(s) and what you want as output.

SAS Innovate 2025: Call for Content