09-05-2012 10:36 PM
I have a table of data cleaning rules in one excel sheet, with variables: table_name, statement, like the following:
a if ..... then .....
b if birth_date=. then birth_date=input(substr(id,3,8)),yymmdd10.)
The excel file will be modifed by our customer, new data cleaning rules will be added or modified into it. During the ETL process, all rules will in the xls file will be loaded and checked syntax, and then take effect by a SAS macro.
The question is: for an undeclared vairable in a data set. the substr function does not give an error message showing the varible doesn't exist, but create a new one. so the above step of syntax checking will mleiss some errors.
09-06-2012 07:50 AM
SAS will generate notes to log about uninitialized variables. You need to search for that when checking your SAS log for errors.
26 data _null_;
NOTE: Numeric values have been converted to character values at the places given by: (Line)Column).
NOTE: Variable y is uninitialized.
09-06-2012 11:02 AM
You have to understand that no variable are created inside the datastep at runtime.
Datastep is processed in two distinct phases, compile and runtime.
It's at compile time that the layout(s) of the destination dataset(s) are defined, not at runtime.
The interpreter will scan the code and allocate every variable referenced in the code (on a memory area known as Program Data Vector, or PDV, that will output entirely or partially to the destination dataset(s)), sometimes doing some assumptions about type and length of new variables that aren't what you expected, hence the use of LENGHT statement to guide SAS interpreter to the right assumptions.
So when the code runs, everything is pretty well defined in terms of variables, lengths and types.
In you're case the SAS interpreter has allocated a new variable (and defaulted to numeric size 8) before the run time phase because it has seen the reference to it at compiler time. Hope the explanation is clear.
Cheers from Portugal.
Daniel Santos @ www.cgd.pt