07-14-2012 07:55 PM
Proc distance allows the use of an ID statement with a BY statement, as long as ID is character and they are sorted in the same order within BY.
The attached code "Distance with character year as ID" works fine (year is prefixed with Y).
The second attached code, "Distance with numeric year as ID" (no prefix, but year is input as character) returns the error:
"ERROR: The ID variable must have the same values in the same order in each BY group."
Any ideas why?
07-14-2012 09:03 PM
I noticed the same thing. I guess this restriction simplifies the generation of column names in the distance matrix and makes the names more previsible. The error message could be a tad clearer though.
07-15-2012 12:11 PM
Very interesting indeed. So, what might have been construed as a feature is thus an error, since it makes the procedure behaviour inconsistent. Your finding might explain the error message as well. I think that the DISTANCE procedure, when it gets to the second BY-group, checks the ID values against the variable names established during the first BY-group, and finds that they don't match (i.e. '2004' against '_2004'), thus the message "The ID variable must have the same values in the same order in each BY group." Seems like they simply omitted to translate ID values into variable names before doing the comparison.