I encountered a rather odd issue when using PROC HPBIN
that I’d like to share.
First, I used PROC HPBIN
to perform binning on a dataset TEMP01
and saved the resulting binning information into a mapping table (TEMP_MAPPING
)
DATA TEMP01;
LENGTH ID 8;
DO ID=1 TO 1000;
X1 = RANUNI(101);
X2 = 10*RANUNI(201);
X3 = 100*RANUNI(301);
OUTPUT;
END;
RUN;
PROC HPBIN DATA=TEMP01 OUTPUT=BIN_TEMP01 NUMBIN=10 BUCKET;
INPUT X1-X3;
ODS OUTPUT MAPPING=TEMP_MAPPING;
RUN;
This part worked as expected.

However, when I tried to apply the same binning to a different dataset (TEMP02
) using the saved mapping, the process failed — but only when the variable order in the new dataset was different from the original
PROC SQL NOPRINT;
CREATE TABLE TEMP02 AS SELECT X3, X2, X1 FROM TEMP01;
QUIT;
PROC HPBIN DATA=TEMP02 BINS_META=TEMP_MAPPING OUTPUT=BIN_TEMP02 NUMBIN=10 BUCKET;
INPUT X1-X3;
RUN;
The result:

It also makes the issue very difficult to detect and debug. I believe this behavior should be addressed, or at the very least clearly documented, as it can lead to significant confusion and incorrect processing.