07-25-2016 12:31 PM
Is there a good way (or would it be useful to have a way) to generate an ERROR messages in the log when SAS looks up a value in a format and it does not match any value or value range in the format definition?
I'm imagining something like:
proc format; value yn 0="No" 1="Yes" other=_ERROR_ /*pseudo-code*/ ; run;
With that feature, I would not expect a DATA step that simply assigns a format to a variable which happens to have an invalid value to throw an error. So below step would succeed because it does not actually use the format to look up values:
data have; x=1;output; x=3;output; format x yn.; run;
But if you PROC PRINT the dataset, I would want the second record to throw an error. Or if you used a PUT function with the format in a DATA step, it should also thrown an error:
data want; set have; x_label=put(x,yn.); *should error when x=3; run;
I realize I can come close to what I want with something like:
proc format; value yn 0="No" 1="Yes" other="INVALID" ; run; data want; set have; if put(x,yn.)="INVALID" then put "ERROR: found invalid value " x=; run;
But that's an extra statement to write, every time I expect the values defined in a format to be exhaustive (which is most of the formats I write).
When I write a series of IF THEN/ ELSE IF statements, I almost always make them exhaustive and end with an error handling satement: Else put "ERROR: fell through gaps...". And I like that the SELECT block will generate an error message if an unexpected value falls through the gaps and I don't have an otherwise statement.
Is there a good way to have an unexpected value generate an error message when that value is looked up in a format definition?
07-25-2016 02:07 PM
The closest to doing, at least "easily", to what I think you are attempting involves INVALUE not VALUE statements.
If you have a custom informat (INVALUE) and use a definition similar to the one you want you can generate error messages.
proc format library=work; value test 0="No" 1="Yes" other=_ERROR_ /*pseudo-code*/ ; invalue Intest 0,1=_same_ other=_error_; run; data have; informat x intest. ;
format x test. ; Input x; datalines; 0 1 2 ; run;
So the approach would generally be to clean the data before display.
07-25-2016 02:11 PM
_ERROR_ is a valid informatted-value for INFORMATS. Thus SAS generates an error when invalid values come from the outside world (upon INPUT) but let you handle what you transmit to the outside world (via PUT) without interfering.
07-25-2016 03:25 PM
I suppose somewhere in the back of my head I remembered seeing _ERROR_ on an INVALUE statement / informat. But for this use case, I'm wanting the same for a VALUE statument / format. Would you see this as useful?
There are many cases where the source data is not text (i.e. is already a SAS datasets, or coming from an RDBMS, or whatever); personally I don't end up doing much INPUTing of data. But I often use formats for reporting on data, or aggregating data, or whatever. Would be nice to have automated error detection if there is a problem with my format (or data), in the same way as there can now be automated error detection if there is a problem with an informat (or input data).
As I see it, when SAS applies a format to a value and the value does not fit into any of the defined ranges, effectively the formatted value is undefined. Currently, I assume SAS defaults to a best. format in this case.
But from a defensive programming perspective, it would be nice to able to say "if you look up a value in a format and the formatted value is undefined, throw an error." In my head, I would like that to be able to be set that as both a system option (similar to fmterr/nofmterr, something like FmtValueErr/NoFmtValueErr) and define it at the level of a specific format, using other=_error_.
Thinking more, it feels similar to the setting where SAS throws the note:
NOTE: At least one W.D format was too small for the number to be printed. The decimal may be shifted by the "BEST" format.
In that case, a user has asked SAS to apply a specifc format to a value. SAS sees that the value won't fit if that format is used, so it chooses another format (best), and tells the user in the log. In this case, the user has asked SAS to apply a specific format to a value. SAS sees that the value is not defined in the format, so it chooses another format (best) and does NOT tell the user.
07-25-2016 03:36 PM
I personally see this as a data cleaning step unless you use multiple formats to change appearance for different reports.
I'm not exactly sure what you would do with that error message though. Do modify the data for the report or change the format or something else?
One work around that may work for some uses, such as filtering based on that values could be something like:
Proc whatever data=data;
where put(var,customfmt.) ne "_error_";
<other proc statements>
07-25-2016 03:59 PM
Yes, it could be thought of as a data cleaning issue. But really, what I'm looking for is run-time error detection. And if it threw an error (or even note), I would know I had a problem (either mistaken format or invalid data) which needs to be resolved. All I want is an indication of that problem at run time. Since this has been allowed for so many years without a note to the log, I don't suggest that it become an error by default, but I would like a way to make it an error.
SAS throws a lot of notes/errors that could be seen as data cleaning issues, but when SAS encounters the problematic condition it makes some decision, keeps on chugging, and often throws a NOTE, e.g.:
NOTE: Character values have been converted to numeric values at the places given by
NOTE: Mathematical operations could not be performed at the following places. The results of the
operations have been set to missing values....
So for this, could have:
NOTE: Formatted value lookup failed....
As is, I asked SAS to use a format to do a look-up, the look-up failed, but SAS happily chugs along without throwing even a NOTE.
Side note: I love that with hash table methods, e.g. FIND(), I can choose to make a failed FIND() throw an error, simply by omitting the return code variable. I see that as a handy defensive programming technique. It's probably what gave me the idea that it would be nice to have the same for FORMAT lookups.
07-25-2016 03:51 PM
I guess the question is where do you want the error to appear, in the log or in the output (listing or report).If you wanted to trigger SAS error reporting mechanism with say OTHER=_ERROR_ , you would still need something to appear in the report. How would you specify that?
07-25-2016 04:14 PM
I would want the error in the log. I'm fine wih SAS defaulting to use the best. format in output. I just want an error (or at least) NOTE in the log to say that SAS was unable to apply the requested format for some value.
We all know that SAS tends to keep on chugging, regardless of bad notes, warnings, and many errors. So I don't think we need to change that behavior. It's just a case where I think it would be nice to have a note in the log.
Similar to when SAS added VARLENCHK option. It didn't change the result of having a collision in variable lengths, it just added a warning to the log when that occurs. And I think in this case it was actually a new warning, turned on by default at some point. Pretty sure in v6 and maybe v7 and v8, those variable length collisions didn't throw any NOTE: at all. But someone at SAS decided it was risky enough behavior that it might be good to alert users by throwing a log message (and allowing them to choose between VARLENCHK=ERROR|WARN|NOWARN )
07-25-2016 10:19 PM
Maybe a workaround way is checking the value before applying a format on it ? proc format; value yn 0="No" 1="Yes" other='ERROR' ; run; data have; x=1;output; x=3;output; run; data _null_; set have; if put(x,yn.)='ERROR' then do; putlog 'ERROR: Format'; stop; end; run;
07-26-2016 09:41 AM