BookmarkSubscribeRSS Feed
tish
Calcite | Level 5

I have been using PROC FORMAT to define both formats and informats for use with ICD-9-CM diagnosis codes. The diagnosis codes that I am using are left aligned in 6 character variables with implied decimal points. Since billable codes can range from 3 digits to 5 digits, there is always at least one trailing blank.

(1) How does PROC FORMAT handle trailing blanks? When I use a VALUE or INVALUE statement to define a format or informat, must I include the trailing blanks? The following code, for example, shows a format definition that uses 5 characters to map one diagnostic category:

   value $rcomfmt

      "39891",

      "4280 "-"4289 " = "CHF"       /* Congestive heart failure */

Can I assume that this SAS format will match my data where, say, dx1="39891 "? How about where dx1="4281  "?

Here is another VALUE statement using a range:

   value $psychfmt

      "2950  "-"29595 " = "Schizophrenia"

Can I assume that this SAS format will match my data where, say, dx1="29500 "?

(2) Is it more efficient to define a format (or informat) with a range or a series of values equaling an output value such as:

   invalue ACCPD2D                                        /* acute bronchitis */

      '4660  ',   /* ACUTE BRONCHITIS           */

      '490   '    /* BRONCHITIS NOS             */ = 1

      other    = 0 ;

Or is this more efficient? (Yes, here it's a format, but in a different program I turned it into an informat because of the way I wanted to use it.)

   value $ACCPD2D

      '4660  ' = 1  /* ACUTE BRONCHITIS            */

      '490   ' = 1  /* BRONCHITIS NOS              */

      other    = 0 ;

Please note that while my examples have only a few codes per category, my formats/informats can have many codes per category and there are many categories. I'm using the formats or informats as table lookups.

Thank you in advance for your ideas! I have spent much too much time hunting through documentation to no avail.

3 REPLIES 3
ballardw
Super User

My experience with custom formats is the values are compared and for Format purposes 'abcd ' is treated the same as 'abcd'

unless the blank is a more esoteric non-printing character than a simple space or two.

I suspect there is no difference in efficiency. If use the CNTLOUT option to see what is generated there's not any difference unlike ranges of numerics.

Ron_MacroMaven
Lapis Lazuli | Level 10

This paper is for the R community

Introducing icd9: working with ICD-9 codes and comorbidities in R.

Jack O. Wasey.

July 21, 2014

http://www.cran.r-project.org/web/packages/icd9/vignettes/icd9.pdf

here is the list of codes:

https://www.section111.cms.hhs.gov/MRA/help/ICD9_DX_Codes.txt

I agree with ballardw:

trailing blanks are not going to change the way the format chooses.

Take a look at the list and search for a $char4. value like 4280

https://www.section111.cms.hhs.gov/MRA/help/ICD9_DX_Codes.txt

Your Q is whethere '4280 ' - '4290 ', apparently a $char5. value

is going to return '42810' - '42899',

I do not think so

but that is an opinion.

Testing ought to confirm that idea.

Ron Fehd  macro maven

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1182 views
  • 0 likes
  • 4 in conversation