Hi all. I used proc hpbin on a train set to derive bins for a set of variable (ngrams),
proc hpbin data=TR_BIGGROUP 
output=move.rank numbin=&bins PSEUDO_QUANTILE;  
input var1 var2 ec..;                
id class class_nword iden;
code file='~/BinCode.sas'; 
run;
which then is intended to be used on another set too to derive bins. However, one of the variables binned is named "not" (remember, they are ngrams), so when I run:
data Rank_score;
set big_group2;
%include '~/BinCode.sas'; 
run;I'm getting the following error when it reaches the variable "not":
MPRINT(EXISTS000):   ***************** BIN_not ********************;
MPRINT(EXISTS000):   length BIN_not 8;
8406      +if missing(not) then do; BIN_not = 0; end;
                         _                       ___
                         386                     161
                         76
MPRINT(EXISTS000):   if missing(not) then do;
MPRINT(EXISTS000):   BIN_not = 0;
MPRINT(EXISTS000):   end;
8407      +else if not < 1.0012 then do; BIN_not =     1; end;
                       _             __                   ___
                       22            180                  161
MPRINT(EXISTS000):   else if not < 1.0012 then do;
MPRINT(EXISTS000):   BIN_not = 1;
MPRINT(EXISTS000):   end;
8408      +else if 1.0012 <= not < 2.0008 then do; BIN_not =     2; end;
           ____                  _             __                   ___
           160                   22            180                  161
MPRINT(EXISTS000):   else if 1.0012 <= not < 2.0008 then do;
MPRINT(EXISTS000):   BIN_not = 2;
301 The SAS System
MPRINT(EXISTS000):   end;
8409      +else if 2.0008 <= not < 3.0004 then do; BIN_not =     3; end;
           ____                  _             __                   ___
           160                   22            180                  161
MPRINT(EXISTS000):   else if 2.0008 <= not < 3.0004 then do;
MPRINT(EXISTS000):   BIN_not = 3;
MPRINT(EXISTS000):   end;
ERROR 386-185: Expecting an arithmetic expression.
ERROR 161-185: No matching DO/SELECT statement.
ERROR 76-322: Syntax error, statement will be ignored.
ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, a numeric constant, a datetime constant, 
              a missing value, INPUT, PUT.  
ERROR 180-322: Statement is not valid or it is used out of proper order.
ERROR 160-185: No matching IF-THEN clause.
8410      +else if 3.0004 <= not then do; BIN_not =     4; end;
           ____                       __                   ___
           160                        22                   161
MPRINT(EXISTS000):   else if 3.0004 <= not then do;
MPRINT(EXISTS000):   BIN_not = 4;
MPRINT(EXISTS000):   end;Clearly, the "scoring" code is not able to use "" to avoid that the "not" variable to be confused with a not statement.
Any idea about how to force proc hpbin to generate a scoring code which use "" to recall variables to bins?
Bad juju arises when one allows variables to be named NOT, AND, OR and other SAS keywords. In some places you may get by with it.
If you really must have such problematic names then perhaps setting the System option VALIDVARNAME=any and use name literals like "not"n . Which may require modifying your existing data. Which is just more convoluted than changing the name in the first place.
Proc datasets will change the Name of existing variables in place using the MODIFY statement.
I must admit this is the first time I remember the data step compiler being confused by the choice of variable name.
You can fix it by referencing the variable using a name literal.
2866  data test;
2867    do not=1,.,0;
2868      if missing(not) then put  not= 'MISSING';
                        -
                        386
                        200
                        76
ERROR 386-185: Expecting an arithmetic expression.
ERROR 200-322: The symbol is not recognized and will be ignored.
ERROR 76-322: Syntax error, statement will be ignored.
2869      else put not= 'PRESENT';
2870    end;
2871  run;
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.TEST may be incomplete.  When this step was stopped there were 0 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds
2872  data test;
2873    do not=1,.,0;
2874      if missing('not'n) then put  not= 'MISSING';
2875      else put not= 'PRESENT';
2876    end;
2877  run;
not=1 PRESENT
not=. MISSING
not=0 PRESENT
NOTE: The data set WORK.TEST has 1 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
Note: There is no need to change the VALIDVARNAME option to use a name literal in the code. But if VALIDVARNAME is not set to ANY then the name the literal resolves to need to follow the rules.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
