BookmarkSubscribeRSS Feed
xxformat_com
Barite | Level 11

Hi,

 

Before creating my own program, I was wondering if someone has already developed a macro or a fonction which would inform the user when a value available in the data is not available in the format? Or is there any options already available I would not be aware of?

 

Do you think that it would be a useful update in a future SAS version?

 

Here is a demo program to illustrate the issue:

data demo;
    set sashelp.class end=eof;
    output;
    if eof then
       do;
           sex='U';
           output;
       end;
run;

proc format;
   value $sex 'M'='Male'
              'F'='Female';
run;

data demo;
    set demo;
    sex_long=put(sex,$sex.);
run;

proc print data=demo;
run;

 

15 REPLIES 15
PaigeMiller
Diamond | Level 26

It isn't hard to program something so that you get a flag when a format doesn't cover a value. I don't know if that's the same as having a WARNING, but this would meet my needs.

 

data demo;
    set demo;
    format_flag = (vvalue(sex)=sex);
    format sex $sex.;
run;

 

--
Paige Miller
xxformat_com
Barite | Level 11

Hi Paige, Thanks. Just adding a note for people who will read this post later. The new variable with have a length of 200 unless a length statement is specified before the vvalue function.

 

If you have any suggestion for numeric format, I'm up for it.

xxformat_com
Barite | Level 11
Using a ifn fonction may be easier for people to interpret the code. For original numeric variables, the code would have to be updated to avoid messages in the log e.g. chk=ifn(vvalue(sex)=put(sex,best12.),1,0);
FreelanceReinh
Jade | Level 19

Hi @xxformat_com,

 

As you mention the INPUT function in the subject line: In an informat definition you could specify other=_error_ (see documentation) to get an "Invalid argument" note in the log (or "Invalid data" with the INPUT statement) if such an unexpected value is encountered:

proc format;
invalue $sex 'M'='Male'
             'F'='Female'
           other=_error_;
run;

data test;
set demo;
sex_long=input(sex,$sex.);
run;

Together with the system option

options errors=1;

this would even issue a real warning:

398   data test;
399   set demo;
400   sex_long=input(sex,$sex.);
401   run;

NOTE: Invalid argument to function INPUT at line 400 column 10.
WARNING: Limit set by ERRORS= option reached.  Further errors of this type will not be printed.
Name=William Sex=U Age=15 Height=66.5 Weight=112 sex_long=  _ERROR_=1 _N_=20

 

xxformat_com
Barite | Level 11
Thanks. Useful when going from character to character. Another interesting use of the character informat (beside removing leading and trailing blanks). In case the original value is numeric, it would means that it would first have to be converted to character before using input/inputc.
Quentin
Super User

Yes, I like the idea of a system option which would allow you to detect this as an error.  My other thought was to allow a format like:

proc format;
  value sex
  1='F'
  2='M'
  other=_ERROR_  /*pseudo code*/
;
run;

data have;
  input sex ;
  cards;
1
1
3
;
run;

Where you would get an error if you used the format and a value fell into the OTHER bin.  So below PROC step and DATA step would both throw errors.

 

proc freq data=have;
  tables sex;
  format sex sex.;
run;

data want;
  set have;
  sexC=put(sex,sex.);
run;

 

I had suggested adding this feature to formats in this ballot item:

https://communities.sas.com/t5/SASware-Ballot-Ideas/Add-other-error-for-PROC-FORMAT/idi-p/323639

(and mentioned the system option alternative in the comments)

 

I would encourage you to upvote and/or comment there, if you agree with the suggestion.  But as it's already 5 years old, I'm not holding my breath.  : )

BASUG is hosting free webinars Next up: Mike Raithel presenting on validating data files on Wednesday July 17. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
xxformat_com
Barite | Level 11

Note that other=_error_ would only work for user created formats. If for example I have a missing numeric value, I would still get a dot as character value using a best format (unless there is other format which could deal with missing numeric values or using workarounds like the global option missing=).

Reeza
Super User
Seems like a Data Quality problem, which SAS Data Quality Studio is designed for. Some users have written macros to implement these types of data quality checks, since your formats are essentially a list of acceptable values. Search LexJansen for examples of implementing data quality checks.

Quentin
Super User

Agree @Reeza, it's a data quality problem.

 

But one of the great things about SAS is you can easily turn on options to catch all kinds of data quality problems (missing values, incorrect variable types, variable collisions, duplicate records etc etc.)  I see SAS as a language that is built to detect/handle bad data (much more than SQL, which feels like it is built for clean data). So even in production SAS jobs, I try to have my SAS code detect as many data problems as it can while it is running.  Basically, no matter what cleaning is done in advance, I don't want my SAS code to trust that the data are clean.  Because I don't believe in clean data.

 

Since we already have other=_error_ for informats, seems like asking for other=_error_ for formats is not a stretch.  (Well, I would hope it's not a stretch, but of course who knows how this stuff works under the hood. : )

BASUG is hosting free webinars Next up: Mike Raithel presenting on validating data files on Wednesday July 17. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
Tom
Super User Tom
Super User

@Quentin wrote:

...

 

Since we already have other=_error_ for informats, seems like asking for other=_error_ for formats is not a stretch.  (Well, I would hope it's not a stretch, but of course who knows how this stuff works under the hood. : )


It probably is more a stretch.  With an INFORMAT you are already in a situation where you are trying to assign the result to a variable.  But FORMATs can just be used in reporting step.   Where/who is it going to give that error message to for a value that is not defined for the format?

Quentin
Super User

@Tom wrote:

@Quentin wrote:

...

 

Since we already have other=_error_ for informats, seems like asking for other=_error_ for formats is not a stretch.  (Well, I would hope it's not a stretch, but of course who knows how this stuff works under the hood. : )


It probably is more a stretch.  With an INFORMAT you are already in a situation where you are trying to assign the result to a variable.  But FORMATs can just be used in reporting step.   Where/who is it going to give that error message to for a value that is not defined for the format?


To the log.  Similar to the note you get in the log if you use a format that is not wide enough to accommodate a value:

 

 

1    proc print data=sashelp.class ;
2     format height 1. ;
3    run ;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: At least one W.D format was too small for the number to be printed. The decimal may be shifted
      by the "BEST" format.

 I see that note as SAS detecting while the code executes that a value is not appropriate for the format applied, and SAS adjusts the format accordingly. The log note tells you that SAS changed the format.  I treat that note as an error.

 

 

So if there were system option FORMATGAP=NONOTE|NOTE|WARN|ERROR and it was set to NOTE, when you run:

 

 

proc freq data=have;
  tables sex;
  format sex sex.;
run;

data want;
  set have;
  sexC=put(sex,sex.);
run;

The results would not change. But both steps would throw a NOTE to the log: "NOTE: At least one value not found in format lookup.  BEST format used instead." or some-such thing.

 

 

It's a similar case.  I have told SAS to use the SEX format for a value.  SAS determines that it cannot be used (because the value 3 is missing from the definition) and instead uses the BEST format.  That's fine, I just want SAS to tell me when it happens.  I don't want SAS to silently replace the format I asked for with the BEST format.

BASUG is hosting free webinars Next up: Mike Raithel presenting on validating data files on Wednesday July 17. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
Tom
Super User Tom
Super User

Like this?  But only when the format is actually needed.  right now it stops when it loads the format and before it even tries to use it.

1894  proc format ;
1895  value $males 'M'='Male' other=[error1.] ;
NOTE: Format $MALES is already on the library WORK.FORMATS.
NOTE: Format $MALES has been output.
1896  run;

NOTE: PROCEDURE FORMAT used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


1897
1898  proc freq data=sashelp.class;
1899    tables sex;
1900    format sex $males.;
ERROR: The format $MALES has a label that defines another format to be loaded (named $ERROR), but this format could not be
       successfully loaded (possibly for the same reason).
1901  run;

NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE FREQ used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

 

Quentin
Super User

Yes, like that.  I was very excited for a second, before reading your text.

 

Yes, would want it as an execution-time error that only happened when an unformatted value was encountered.  Not as a compile-time error (or maybe it's "interpretation-time" for PROCs).

BASUG is hosting free webinars Next up: Mike Raithel presenting on validating data files on Wednesday July 17. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
Tom
Super User Tom
Super User

Perhaps something with PROC FCMP?

proc fcmp outlib=work.functions.formats;
   function cfmterr(c $) $;
      length x 8;
      x=1/0;
      return('Invalid Value');
   endsub;

run;
options cmplib=(work.functions);

proc format ;
value $gender (default=6) 'M'='Male' 'F'='Female'  other=[cfmterr()] ;
run;

data test;
  do sex='M','F','x';
    output;
  end;
  format sex $gender.;
run;

data _null_;
 set test;
 put sex= ;
run;


proc print;
run;

proc print;
  format sex;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 1891 views
  • 6 likes
  • 6 in conversation