BookmarkSubscribeRSS Feed
tish
Calcite | Level 5

I have been using PROC FORMAT to define both formats and informats for use with ICD-9-CM diagnosis codes. The diagnosis codes that I am using are left aligned in 6 character variables with implied decimal points. Since billable codes can range from 3 digits to 5 digits, there is always at least one trailing blank.

(1) How does PROC FORMAT handle trailing blanks? When I use a VALUE or INVALUE statement to define a format or informat, must I include the trailing blanks? The following code, for example, shows a format definition that uses 5 characters to map one diagnostic category:

   value $rcomfmt

      "39891",

      "4280 "-"4289 " = "CHF"       /* Congestive heart failure */

Can I assume that this SAS format will match my data where, say, dx1="39891 "? How about where dx1="4281  "?

Here is another VALUE statement using a range:

   value $psychfmt

      "2950  "-"29595 " = "Schizophrenia"

Can I assume that this SAS format will match my data where, say, dx1="29500 "?

(2) Is it more efficient to define a format (or informat) with a range or a series of values equaling an output value such as:

   invalue ACCPD2D                                        /* acute bronchitis */

      '4660  ',   /* ACUTE BRONCHITIS           */

      '490   '    /* BRONCHITIS NOS             */ = 1

      other    = 0 ;

Or is this more efficient? (Yes, here it's a format, but in a different program I turned it into an informat because of the way I wanted to use it.)

   value $ACCPD2D

      '4660  ' = 1  /* ACUTE BRONCHITIS            */

      '490   ' = 1  /* BRONCHITIS NOS              */

      other    = 0 ;

Please note that while my examples have only a few codes per category, my formats/informats can have many codes per category and there are many categories. I'm using the formats or informats as table lookups.

Thank you in advance for your ideas! I have spent much too much time hunting through documentation to no avail.

3 REPLIES 3
ballardw
Super User

My experience with custom formats is the values are compared and for Format purposes 'abcd ' is treated the same as 'abcd'

unless the blank is a more esoteric non-printing character than a simple space or two.

I suspect there is no difference in efficiency. If use the CNTLOUT option to see what is generated there's not any difference unlike ranges of numerics.

Ron_MacroMaven
Lapis Lazuli | Level 10

This paper is for the R community

Introducing icd9: working with ICD-9 codes and comorbidities in R.

Jack O. Wasey.

July 21, 2014

http://www.cran.r-project.org/web/packages/icd9/vignettes/icd9.pdf

here is the list of codes:

https://www.section111.cms.hhs.gov/MRA/help/ICD9_DX_Codes.txt

I agree with ballardw:

trailing blanks are not going to change the way the format chooses.

Take a look at the list and search for a $char4. value like 4280

https://www.section111.cms.hhs.gov/MRA/help/ICD9_DX_Codes.txt

Your Q is whethere '4280 ' - '4290 ', apparently a $char5. value

is going to return '42810' - '42899',

I do not think so

but that is an opinion.

Testing ought to confirm that idea.

Ron Fehd  macro maven

hackathon24-white-horiz.png

Join the 2025 SAS Hackathon!

Calling all data scientists and open-source enthusiasts! Want to solve real problems that impact your company or the world? Register to hack by August 31st!

Register Now

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1885 views
  • 0 likes
  • 4 in conversation