05-10-2013 11:46 AM
I have been using PROC FORMAT to define both formats and informats for use with ICD-9-CM diagnosis codes. The diagnosis codes that I am using are left aligned in 6 character variables with implied decimal points. Since billable codes can range from 3 digits to 5 digits, there is always at least one trailing blank.
(1) How does PROC FORMAT handle trailing blanks? When I use a VALUE or INVALUE statement to define a format or informat, must I include the trailing blanks? The following code, for example, shows a format definition that uses 5 characters to map one diagnostic category:
"4280 "-"4289 " = "CHF" /* Congestive heart failure */
Can I assume that this SAS format will match my data where, say, dx1="39891 "? How about where dx1="4281 "?
Here is another VALUE statement using a range:
"2950 "-"29595 " = "Schizophrenia"
Can I assume that this SAS format will match my data where, say, dx1="29500 "?
(2) Is it more efficient to define a format (or informat) with a range or a series of values equaling an output value such as:
invalue ACCPD2D /* acute bronchitis */
'4660 ', /* ACUTE BRONCHITIS */
'490 ' /* BRONCHITIS NOS */ = 1
other = 0 ;
Or is this more efficient? (Yes, here it's a format, but in a different program I turned it into an informat because of the way I wanted to use it.)
'4660 ' = 1 /* ACUTE BRONCHITIS */
'490 ' = 1 /* BRONCHITIS NOS */
other = 0 ;
Please note that while my examples have only a few codes per category, my formats/informats can have many codes per category and there are many categories. I'm using the formats or informats as table lookups.
Thank you in advance for your ideas! I have spent much too much time hunting through documentation to no avail.
05-10-2013 04:29 PM
My experience with custom formats is the values are compared and for Format purposes 'abcd ' is treated the same as 'abcd'
unless the blank is a more esoteric non-printing character than a simple space or two.
I suspect there is no difference in efficiency. If use the CNTLOUT option to see what is generated there's not any difference unlike ranges of numerics.
10-20-2014 04:30 PM
This paper is for the R community
Introducing icd9: working with ICD-9 codes and comorbidities in R.
Jack O. Wasey.
July 21, 2014
here is the list of codes:
I agree with ballardw:
trailing blanks are not going to change the way the format chooses.
Take a look at the list and search for a $char4. value like 4280
Your Q is whethere '4280 ' - '4290 ', apparently a $char5. value
is going to return '42810' - '42899',
I do not think so
but that is an opinion.
Testing ought to confirm that idea.
Ron Fehd macro maven