- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Before creating my own program, I was wondering if someone has already developed a macro or a fonction which would inform the user when a value available in the data is not available in the format? Or is there any options already available I would not be aware of?
Do you think that it would be a useful update in a future SAS version?
Here is a demo program to illustrate the issue:
data demo;
set sashelp.class end=eof;
output;
if eof then
do;
sex='U';
output;
end;
run;
proc format;
value $sex 'M'='Male'
'F'='Female';
run;
data demo;
set demo;
sex_long=put(sex,$sex.);
run;
proc print data=demo;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It isn't hard to program something so that you get a flag when a format doesn't cover a value. I don't know if that's the same as having a WARNING, but this would meet my needs.
data demo;
set demo;
format_flag = (vvalue(sex)=sex);
format sex $sex.;
run;
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Paige, Thanks. Just adding a note for people who will read this post later. The new variable with have a length of 200 unless a length statement is specified before the vvalue function.
If you have any suggestion for numeric format, I'm up for it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @xxformat_com,
As you mention the INPUT function in the subject line: In an informat definition you could specify other=_error_ (see documentation) to get an "Invalid argument" note in the log (or "Invalid data" with the INPUT statement) if such an unexpected value is encountered:
proc format; invalue $sex 'M'='Male' 'F'='Female' other=_error_; run; data test; set demo; sex_long=input(sex,$sex.); run;
Together with the system option
options errors=1;
this would even issue a real warning:
398 data test; 399 set demo; 400 sex_long=input(sex,$sex.); 401 run; NOTE: Invalid argument to function INPUT at line 400 column 10. WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed. Name=William Sex=U Age=15 Height=66.5 Weight=112 sex_long= _ERROR_=1 _N_=20
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, I like the idea of a system option which would allow you to detect this as an error. My other thought was to allow a format like:
proc format; value sex 1='F' 2='M' other=_ERROR_ /*pseudo code*/ ; run; data have; input sex ; cards; 1 1 3 ; run;
Where you would get an error if you used the format and a value fell into the OTHER bin. So below PROC step and DATA step would both throw errors.
proc freq data=have;
tables sex;
format sex sex.;
run;
data want;
set have;
sexC=put(sex,sex.);
run;
I had suggested adding this feature to formats in this ballot item:
https://communities.sas.com/t5/SASware-Ballot-Ideas/Add-other-error-for-PROC-FORMAT/idi-p/323639
(and mentioned the system option alternative in the comments)
I would encourage you to upvote and/or comment there, if you agree with the suggestion. But as it's already 5 years old, I'm not holding my breath. : )
Next up: Troy Martin Hughes presents Calling Open-Source Python Functions within SAS PROC FCMP: A Google Maps API Geocoding Adventure on Wednesday April 23.
Register now at https://www.basug.org/events.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Note that other=_error_ would only work for user created formats. If for example I have a missing numeric value, I would still get a dot as character value using a best format (unless there is other format which could deal with missing numeric values or using workarounds like the global option missing=).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Agree @Reeza, it's a data quality problem.
But one of the great things about SAS is you can easily turn on options to catch all kinds of data quality problems (missing values, incorrect variable types, variable collisions, duplicate records etc etc.) I see SAS as a language that is built to detect/handle bad data (much more than SQL, which feels like it is built for clean data). So even in production SAS jobs, I try to have my SAS code detect as many data problems as it can while it is running. Basically, no matter what cleaning is done in advance, I don't want my SAS code to trust that the data are clean. Because I don't believe in clean data.
Since we already have other=_error_ for informats, seems like asking for other=_error_ for formats is not a stretch. (Well, I would hope it's not a stretch, but of course who knows how this stuff works under the hood. : )
Next up: Troy Martin Hughes presents Calling Open-Source Python Functions within SAS PROC FCMP: A Google Maps API Geocoding Adventure on Wednesday April 23.
Register now at https://www.basug.org/events.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Quentin wrote:
...
Since we already have other=_error_ for informats, seems like asking for other=_error_ for formats is not a stretch. (Well, I would hope it's not a stretch, but of course who knows how this stuff works under the hood. : )
It probably is more a stretch. With an INFORMAT you are already in a situation where you are trying to assign the result to a variable. But FORMATs can just be used in reporting step. Where/who is it going to give that error message to for a value that is not defined for the format?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Tom wrote:
@Quentin wrote:
...
Since we already have other=_error_ for informats, seems like asking for other=_error_ for formats is not a stretch. (Well, I would hope it's not a stretch, but of course who knows how this stuff works under the hood. : )
It probably is more a stretch. With an INFORMAT you are already in a situation where you are trying to assign the result to a variable. But FORMATs can just be used in reporting step. Where/who is it going to give that error message to for a value that is not defined for the format?
To the log. Similar to the note you get in the log if you use a format that is not wide enough to accommodate a value:
1 proc print data=sashelp.class ; 2 format height 1. ; 3 run ; NOTE: There were 19 observations read from the data set SASHELP.CLASS. NOTE: At least one W.D format was too small for the number to be printed. The decimal may be shifted by the "BEST" format.
I see that note as SAS detecting while the code executes that a value is not appropriate for the format applied, and SAS adjusts the format accordingly. The log note tells you that SAS changed the format. I treat that note as an error.
So if there were system option FORMATGAP=NONOTE|NOTE|WARN|ERROR and it was set to NOTE, when you run:
proc freq data=have;
tables sex;
format sex sex.;
run;
data want;
set have;
sexC=put(sex,sex.);run;
The results would not change. But both steps would throw a NOTE to the log: "NOTE: At least one value not found in format lookup. BEST format used instead." or some-such thing.
It's a similar case. I have told SAS to use the SEX format for a value. SAS determines that it cannot be used (because the value 3 is missing from the definition) and instead uses the BEST format. That's fine, I just want SAS to tell me when it happens. I don't want SAS to silently replace the format I asked for with the BEST format.
Next up: Troy Martin Hughes presents Calling Open-Source Python Functions within SAS PROC FCMP: A Google Maps API Geocoding Adventure on Wednesday April 23.
Register now at https://www.basug.org/events.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Like this? But only when the format is actually needed. right now it stops when it loads the format and before it even tries to use it.
1894 proc format ;
1895 value $males 'M'='Male' other=[error1.] ;
NOTE: Format $MALES is already on the library WORK.FORMATS.
NOTE: Format $MALES has been output.
1896 run;
NOTE: PROCEDURE FORMAT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
1897
1898 proc freq data=sashelp.class;
1899 tables sex;
1900 format sex $males.;
ERROR: The format $MALES has a label that defines another format to be loaded (named $ERROR), but this format could not be
successfully loaded (possibly for the same reason).
1901 run;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, like that. I was very excited for a second, before reading your text.
Yes, would want it as an execution-time error that only happened when an unformatted value was encountered. Not as a compile-time error (or maybe it's "interpretation-time" for PROCs).
Next up: Troy Martin Hughes presents Calling Open-Source Python Functions within SAS PROC FCMP: A Google Maps API Geocoding Adventure on Wednesday April 23.
Register now at https://www.basug.org/events.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Perhaps something with PROC FCMP?
proc fcmp outlib=work.functions.formats;
function cfmterr(c $) $;
length x 8;
x=1/0;
return('Invalid Value');
endsub;
run;
options cmplib=(work.functions);
proc format ;
value $gender (default=6) 'M'='Male' 'F'='Female' other=[cfmterr()] ;
run;
data test;
do sex='M','F','x';
output;
end;
format sex $gender.;
run;
data _null_;
set test;
put sex= ;
run;
proc print;
run;
proc print;
format sex;
run;