Help using Base SAS procedures

Handle unexpected characters with Proc Format

Accepted Solution Solved
Reply
Contributor
Posts: 47
Accepted Solution

Handle unexpected characters with Proc Format

Hi,

I'm using the following format:

proc format;

    invalue month_pncf (default=2)

     0 =        88

    99 =        .U

     . =        .U

    other =     _same_;

run;

expecting the raw data to be in numeric form.   However, when the incoming data is a dash '-' in the .dat file, I get a value of dot '.' for the formatted data.  Is there some way to make the proc format more robust so that dashes or other characters return .U?  It seems the other=_same_ isn't handling non-numeric characters.

The log has an error of "Invalid data for pnc_mth in line xxx"

Ideally I'd like to do this solely with proc format - changing from an old way in which we scanned the data for 'illegal' vars prior to formatting.

Thank you,

Anjali


Accepted Solutions
Solution
‎09-07-2011 02:25 PM
Super User
Super User
Posts: 7,039

Handle unexpected characters with Proc Format

If your macro was generating one INPUT statement you can split it up by using the trailing @.

Instead of :

   #1 @003 certnum 6.

   #1 @009 mm_dob monthf.

   #1 @011 dd_dob 2.

Generate a complete INPUT statement with a trailing @.

input #1 @003 certnum ?6. @;

if _error_ then put 'Invalid data for variable ' "certnum" ;

_error_=0;

input #1 @009 mm_dob ?monthf. @;

if _error_ then put 'Invalid data for variable ' "mm_dob" ;

_error_=0;

View solution in original post


All Replies
Super User
Super User
Posts: 7,039

Handle unexpected characters with Proc Format

Just add it to your INFORMAT definition. You will need to add the quotes that you have omitted for the other values.

proc format;

    invalue month_pncf (default=2)

    '0'          =  88

    '99','.','-'  = .U

    other =     _same_;

run;

Trusted Advisor
Posts: 1,301

Handle unexpected characters with Proc Format

proc format;

invalue numfmt (default=2)

  0  = 88

  99 = .U

  .  = .U

  other = _same_;

run;

data input;

input num numfmt.;

if _ERROR_=1 then do; num=.U; _ERROR_=0; end;

datalines;

0

1

2

3

99

.

-

!

3

T

Y

;

run;

You will still get the note about invalid data, but not an error and the value will be equal to .U when an invalid value is encountered.

Respected Advisor
Posts: 3,799

Handle unexpected characters with Proc Format

Seems like you have two choices.  Define the range for _SAME_ to include values from 1-98 and let OTHER=.U or use the INVALIDDATA option.

options invaliddata=U;

Contributor
Posts: 47

Handle unexpected characters with Proc Format

Posted in reply to data_null__

I assume the OPTIONS way will affect the entire program.    And the error handling way is particular to each variable.   I have a lot of vars to read in for a particular infile - is there a middle way to have the error/invaliddata handled within 1 datastep for all incoming vars?

Thank you

Super User
Super User
Posts: 7,039

Handle unexpected characters with Proc Format

You could fully specify the valid range.

  '1','2','3','4','5','6','7','8','9'

,'10' - '19'

,'20' - '29'

,'30' - '39'

....

= _same_

Contributor
Posts: 47

Handle unexpected characters with Proc Format

I could do that for month - which is a small range.  But I also have items like birthweight in grams and then the range we accept is infinite.  The

options invaliddata=U; didn't accept .U which is the desired formatted value.  Still trying.

Respected Advisor
Posts: 3,799

Handle unexpected characters with Proc Format

In the INVALIDDATA option you specify the missing value without the dot. Do you have access to online help?

Syntax

INVALIDDATA='character'

Syntax Description

'character'

specifies the value to be assigned, which can be a letter (A through Z, a through z), a period (.), or an underscore (_). The default value is a period.

Contributor
Posts: 47

Handle unexpected characters with Proc Format

Posted in reply to data_null__

I'll try just 'U' - but I need it to indicate the missing value of .U (numeric) and not the character 'U'.

Thank you.

Super User
Super User
Posts: 7,039

Handle unexpected characters with Proc Format

You could use the automatic variable _ERROR_  and the ?? operator to suppress the error messages.

But I think you would need to read each variable separately.

input num ??numfmt. @ ;

if _error_ then numfmt=.U ;

_error_=0;

...

Super User
Super User
Posts: 7,039

Handle unexpected characters with Proc Format

Actuall better still just use ?? and leave the value missing?

Trusted Advisor
Posts: 1,301

Handle unexpected characters with Proc Format

proc format;

invalue myfmt (default=2)

  0  = 88

  99 = .U

  other = _same_;

run;

data want;

input mynum :??myfmt. @@; /* nice way to suppress the unnecessary notes from my previous example, thanks Tom */

if mynum=. then mynum=.U;

putlog mynum=;

cards4;

0 1 2

Y T D

! . (

99 0 ;

;;;;

run;

Contributor
Posts: 47

Handle unexpected characters with Proc Format

Hi,

I'm trying to find some documentation on ? and ?? - new to me.  Seems I can add ? or ?? to my invalue statements for all or some variables that might result in errors.  FriedEgg uses a colon ":??"  but Tom did not. Is one 'correct'?

  If you have any links to help me understand this operator better - great.  Still searching.

PROC Star
Posts: 7,468

Handle unexpected characters with Proc Format

Contributor
Posts: 47

Handle unexpected characters with Proc Format

Thank you - found it between my post and yours.

Am I correct that if my current macro reads data as:

628        %create_input_string_D3(statevars, birth)

MPRINT(CREATE_INPUT_STRING_D3):   input

MPRINT(CREATE_INPUT_STRING_D3):   #1 @003 certnum 6.

MPRINT(CREATE_INPUT_STRING_D3):   #1 @009 mm_dob monthf.

MPRINT(CREATE_INPUT_STRING_D3):   #1 @011 dd_dob 2.

MPRINT(CREATE_INPUT_STRING_D3):   #1 @015 sex u_nine.

MPRINT(CREATE_INPUT_STRING_D3):   #1 @016 b_state 2.

MPRINT(CREATE_INPUT_STRING_D3):   #1 @031 m_state 2.

MPRINT(CREATE_INPUT_STRING_D3):   #1 @018 b_county $char3.

MPRINT(CREATE_INPUT_STRING_D3):   #1 @021 pob u_nine.

MPRINT(CREATE_INPUT_STRING_D3):   #1 @023 mm_mdob monthf.

MPRINT(CREATE_INPUT_STRING_D3):   #1 @025 dd_mdob 2.

MPRINT(CREATE_INPUT_STRING_D3):   #1 @033 mom_res $char3.

MPRINT(CREATE_INPUT_STRING_D3):   #1 @036 mm_pdob monthf.

MPRINT(CREATE_INPUT_STRING_D3):   #1 @038 dd_pdob 2.

MPRINT(CREATE_INPUT_STRING_D3):   #1 @042 hisp_bc hisp_bc_xx.

etc ...

I would need a separate input statement for each variable in order to handle the errors?  That's a change - but probably possible.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 23 replies
  • 246 views
  • 6 likes
  • 5 in conversation