Hi all. I used proc hpbin on a train set to derive bins for a set of variable (ngrams),
proc hpbin data=TR_BIGGROUP
output=move.rank numbin=&bins PSEUDO_QUANTILE;
input var1 var2 ec..;
id class class_nword iden;
code file='~/BinCode.sas';
run;
which then is intended to be used on another set too to derive bins. However, one of the variables binned is named "not" (remember, they are ngrams), so when I run:
data Rank_score;
set big_group2;
%include '~/BinCode.sas';
run;
I'm getting the following error when it reaches the variable "not":
MPRINT(EXISTS000): ***************** BIN_not ********************;
MPRINT(EXISTS000): length BIN_not 8;
8406 +if missing(not) then do; BIN_not = 0; end;
_ ___
386 161
76
MPRINT(EXISTS000): if missing(not) then do;
MPRINT(EXISTS000): BIN_not = 0;
MPRINT(EXISTS000): end;
8407 +else if not < 1.0012 then do; BIN_not = 1; end;
_ __ ___
22 180 161
MPRINT(EXISTS000): else if not < 1.0012 then do;
MPRINT(EXISTS000): BIN_not = 1;
MPRINT(EXISTS000): end;
8408 +else if 1.0012 <= not < 2.0008 then do; BIN_not = 2; end;
____ _ __ ___
160 22 180 161
MPRINT(EXISTS000): else if 1.0012 <= not < 2.0008 then do;
MPRINT(EXISTS000): BIN_not = 2;
301 The SAS System
MPRINT(EXISTS000): end;
8409 +else if 2.0008 <= not < 3.0004 then do; BIN_not = 3; end;
____ _ __ ___
160 22 180 161
MPRINT(EXISTS000): else if 2.0008 <= not < 3.0004 then do;
MPRINT(EXISTS000): BIN_not = 3;
MPRINT(EXISTS000): end;
ERROR 386-185: Expecting an arithmetic expression.
ERROR 161-185: No matching DO/SELECT statement.
ERROR 76-322: Syntax error, statement will be ignored.
ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, a numeric constant, a datetime constant,
a missing value, INPUT, PUT.
ERROR 180-322: Statement is not valid or it is used out of proper order.
ERROR 160-185: No matching IF-THEN clause.
8410 +else if 3.0004 <= not then do; BIN_not = 4; end;
____ __ ___
160 22 161
MPRINT(EXISTS000): else if 3.0004 <= not then do;
MPRINT(EXISTS000): BIN_not = 4;
MPRINT(EXISTS000): end;
Clearly, the "scoring" code is not able to use "" to avoid that the "not" variable to be confused with a not statement.
Any idea about how to force proc hpbin to generate a scoring code which use "" to recall variables to bins?
Bad juju arises when one allows variables to be named NOT, AND, OR and other SAS keywords. In some places you may get by with it.
If you really must have such problematic names then perhaps setting the System option VALIDVARNAME=any and use name literals like "not"n . Which may require modifying your existing data. Which is just more convoluted than changing the name in the first place.
Proc datasets will change the Name of existing variables in place using the MODIFY statement.
I must admit this is the first time I remember the data step compiler being confused by the choice of variable name.
You can fix it by referencing the variable using a name literal.
2866 data test; 2867 do not=1,.,0; 2868 if missing(not) then put not= 'MISSING'; - 386 200 76 ERROR 386-185: Expecting an arithmetic expression. ERROR 200-322: The symbol is not recognized and will be ignored. ERROR 76-322: Syntax error, statement will be ignored. 2869 else put not= 'PRESENT'; 2870 end; 2871 run; NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set WORK.TEST may be incomplete. When this step was stopped there were 0 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.01 seconds 2872 data test; 2873 do not=1,.,0; 2874 if missing('not'n) then put not= 'MISSING'; 2875 else put not= 'PRESENT'; 2876 end; 2877 run; not=1 PRESENT not=. MISSING not=0 PRESENT NOTE: The data set WORK.TEST has 1 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
Note: There is no need to change the VALIDVARNAME option to use a name literal in the code. But if VALIDVARNAME is not set to ANY then the name the literal resolves to need to follow the rules.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.