Hi,
I'm using the following format:
proc format;
invalue month_pncf (default=2)
0 = 88
99 = .U
. = .U
other = _same_;
run;
expecting the raw data to be in numeric form. However, when the incoming data is a dash '-' in the .dat file, I get a value of dot '.' for the formatted data. Is there some way to make the proc format more robust so that dashes or other characters return .U? It seems the other=_same_ isn't handling non-numeric characters.
The log has an error of "Invalid data for pnc_mth in line xxx"
Ideally I'd like to do this solely with proc format - changing from an old way in which we scanned the data for 'illegal' vars prior to formatting.
Thank you,
Anjali
If your macro was generating one INPUT statement you can split it up by using the trailing @.
Instead of :
#1 @003 certnum 6.
#1 @009 mm_dob monthf.
#1 @011 dd_dob 2.
Generate a complete INPUT statement with a trailing @.
input #1 @003 certnum ?6. @;
if _error_ then put 'Invalid data for variable ' "certnum" ;
_error_=0;
input #1 @009 mm_dob ?monthf. @;
if _error_ then put 'Invalid data for variable ' "mm_dob" ;
_error_=0;
Just add it to your INFORMAT definition. You will need to add the quotes that you have omitted for the other values.
proc format;
invalue month_pncf (default=2)
'0' = 88
'99','.','-' = .U
other = _same_;
run;
proc format;
invalue numfmt (default=2)
0 = 88
99 = .U
. = .U
other = _same_;
run;
data input;
input num numfmt.;
if _ERROR_=1 then do; num=.U; _ERROR_=0; end;
datalines;
0
1
2
3
99
.
-
!
3
T
Y
;
run;
You will still get the note about invalid data, but not an error and the value will be equal to .U when an invalid value is encountered.
Seems like you have two choices. Define the range for _SAME_ to include values from 1-98 and let OTHER=.U or use the INVALIDDATA option.
options invaliddata=U;
I assume the OPTIONS way will affect the entire program. And the error handling way is particular to each variable. I have a lot of vars to read in for a particular infile - is there a middle way to have the error/invaliddata handled within 1 datastep for all incoming vars?
Thank you
You could fully specify the valid range.
'1','2','3','4','5','6','7','8','9'
,'10' - '19'
,'20' - '29'
,'30' - '39'
....
= _same_
I could do that for month - which is a small range. But I also have items like birthweight in grams and then the range we accept is infinite. The
options invaliddata=U; didn't accept .U which is the desired formatted value. Still trying.
In the INVALIDDATA option you specify the missing value without the dot. Do you have access to online help?
Syntax |
INVALIDDATA='character' |
Syntax Description
'character'
specifies the value to be assigned, which can be a letter (A through Z, a through z), a period (.), or an underscore (_). The default value is a period.
I'll try just 'U' - but I need it to indicate the missing value of .U (numeric) and not the character 'U'.
Thank you.
You could use the automatic variable _ERROR_ and the ?? operator to suppress the error messages.
But I think you would need to read each variable separately.
input num ??numfmt. @ ;
if _error_ then numfmt=.U ;
_error_=0;
...
Actuall better still just use ?? and leave the value missing?
proc format;
invalue myfmt (default=2)
0 = 88
99 = .U
other = _same_;
run;
data want;
input mynum :??myfmt. @@; /* nice way to suppress the unnecessary notes from my previous example, thanks Tom */
if mynum=. then mynum=.U;
putlog mynum=;
cards4;
0 1 2
Y T D
! . (
99 0 ;
;;;;
run;
Hi,
I'm trying to find some documentation on ? and ?? - new to me. Seems I can add ? or ?? to my invalue statements for all or some variables that might result in errors. FriedEgg uses a colon ":??" but Tom did not. Is one 'correct'?
If you have any links to help me understand this operator better - great. Still searching.
Thank you - found it between my post and yours.
Am I correct that if my current macro reads data as:
628 %create_input_string_D3(statevars, birth)
MPRINT(CREATE_INPUT_STRING_D3): input
MPRINT(CREATE_INPUT_STRING_D3): #1 @003 certnum 6.
MPRINT(CREATE_INPUT_STRING_D3): #1 @009 mm_dob monthf.
MPRINT(CREATE_INPUT_STRING_D3): #1 @011 dd_dob 2.
MPRINT(CREATE_INPUT_STRING_D3): #1 @015 sex u_nine.
MPRINT(CREATE_INPUT_STRING_D3): #1 @016 b_state 2.
MPRINT(CREATE_INPUT_STRING_D3): #1 @031 m_state 2.
MPRINT(CREATE_INPUT_STRING_D3): #1 @018 b_county $char3.
MPRINT(CREATE_INPUT_STRING_D3): #1 @021 pob u_nine.
MPRINT(CREATE_INPUT_STRING_D3): #1 @023 mm_mdob monthf.
MPRINT(CREATE_INPUT_STRING_D3): #1 @025 dd_mdob 2.
MPRINT(CREATE_INPUT_STRING_D3): #1 @033 mom_res $char3.
MPRINT(CREATE_INPUT_STRING_D3): #1 @036 mm_pdob monthf.
MPRINT(CREATE_INPUT_STRING_D3): #1 @038 dd_pdob 2.
MPRINT(CREATE_INPUT_STRING_D3): #1 @042 hisp_bc hisp_bc_xx.
etc ...
I would need a separate input statement for each variable in order to handle the errors? That's a change - but probably possible.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.