BookmarkSubscribeRSS Feed
Wolverine
Quartz | Level 8

I'm working with CPT codes, which are used for medical procedures.  There are about 23,000 codes that I'm collapsing into about 160 categories.  Because some of the codes use letters, they are stored in a string variable.  I'm using ranges of codes in Proc Format to create the categories, such as

 

'00100'-'00222' = "00100-00222 head"

'10000'-'10022' = "10000-10022 general"

 

The problem is that other codes that belong in other categories use the same number prefixes, but with a letter at the end:

 

'0001F'-'0015F' = "0001F-0015F Composite measures"

'1000F'-'1220F' = "1000F-1220F Patient history"

 

When i run the syntax, I get these types of errors:

 

ERROR: These two ranges overlap: 0001F-0015F and 00100-00222 (fuzz=0).
ERROR: These two ranges overlap: 10000-10022 and 1000F-1220F (fuzz=0).

 

How can I avoid this overlap without having to specify every value in the category?

10 REPLIES 10
PaigeMiller
Diamond | Level 26

Do you have the codes and their text equivalent in some SAS dataset (or text file or Excel File)?

--
Paige Miller
Wolverine
Quartz | Level 8

I think so, if I'm understanding your question correctly.  The format syntax file was created from an Excel file that has columns like this:

 

Procedure Code Category Short Description
00100 SC - Standard Code ANESTH SALIVARY GLAND
PaigeMiller
Diamond | Level 26

So then it should be relatively easy to use that data set to set up your SAS Formats. In this example code, you would of course have to replace the variable names I am using with the actual variable names.

 

UNTESTED CODE

data _null_;
    set codes_data_set end=eof;
    if _n_=1 then call execute('proc format; value $codef ');
    call execute(cats(procedurecode,'=',shortdescription));
    if eof then call execute('; run;');
run;
--
Paige Miller
Reeza
Super User

Or use a CNTLIN data set. You can create a format from a data set using CNTLIN.

 

data myformat;
set yourdata;

start = procedurecode;
label=shortDescription;
fmtname='proccode_fmt';
type='J';

run;

proc format cntlin=myformat;
run;

http://support.sas.com/documentation/cdl/en/proc/65145/HTML/default/viewer.htm#n1e19y6lrektafn1kj6nb...

Wolverine
Quartz | Level 8

Those examples of syntax would apply the labels to the CPT codes.  But I've already set that up.  The problem I'm having is with collapsing a specified range of codes into categories without causing an overlap error.

Reeza
Super User

Rather than type out the groups then, add them to your Excel file. Then import that as your format.

Either way you still need to list it once manually/semi-manually 😞

 

 

ballardw
Super User

@Wolverine wrote:

Those examples of syntax would apply the labels to the CPT codes.  But I've already set that up.  The problem I'm having is with collapsing a specified range of codes into categories without causing an overlap error.


You are somewhat misunderstanding ranges and values, range is basically a short cut for listing values, with the added benefit that with numeric values you get what you want. Character values do not work that way. Please see this short example using one line per value instead of ranges. Note that the format does not generate any errors as works as expected.

proc format library=work;
value $ grouper
'A' = 'Group 1'
'B' = 'Group 2'
'ABA' = 'Group 1'
'CDC' = 'Group 2'
;
run;

data junk;
   input x $;
datalines;
A
A
A
B
ABA 
ABA
CDC
;
run;

proc freq data=junk;
   tables x;
   format x $grouper.;
run;

So having the code and a value means you can use a cntlin data set based on that to create the format(s) you want and will not have and "range overlap errors". Though it may take some time/effort.

 

 

And here's an example of why "ranges" with character values seldom work as desired.

Note that all the values of the variable x are within the specified range:

data junk;
   input x $;
   inrange = '0001F' le x le '0015F' ;
datalines;
0001F
0014F
0014
0001G
0015A
0002
;
run;

Except for the first two values I doubt that is the way you want them compared.

 

 

Of course even more work could be generated by using a custom informat to assign numeric values that would allow uses of ranges but the correspondence to the original codes would be obfuscated and likely difficult to maintain.

Wolverine
Quartz | Level 8
It's frustrating because only about 1000 of the 28000 codes contain letters. If it wasn't for those 1000 codes, I could use ranges and it would be a lot easier. Another idea would be to recode into a new variable so that multiple zeroes would replace the letters. The new variable could be numeric, and ranges would work because the multiple zeroes would push the alphanumeric codes out of range. From that point, it would still take some work to convert those codes back to their original form, but at least I'd only have 1000 to deal with instead of 28,000.
Reeza
Super User

I would still use the excel approach, adding the categories into the file.

It's easy to filter the data in Excel and recode them quickly with the fill option.

ballardw
Super User

@Wolverine wrote:
It's frustrating because only about 1000 of the 28000 codes contain letters. If it wasn't for those 1000 codes, I could use ranges and it would be a lot easier. Another idea would be to recode into a new variable so that multiple zeroes would replace the letters. The new variable could be numeric, and ranges would work because the multiple zeroes would push the alphanumeric codes out of range. From that point, it would still take some work to convert those codes back to their original form, but at least I'd only have 1000 to deal with instead of 28,000.

Is every single code exactly 5 characters? Then possibly that is a true statement.

I agree with @Reeza about ways to build lists that will work with proc format. One of the things I routinely do is convert documentation of "acceptable values" for data fields that are supplied for a project and use Excel to help write format strings, labels and sometimes input  and informat statements.

 

Wait until you get to work with ICD-10 codes where the lengths vary a lot as they keep adding "decimal point values" to indicate submembership: foot=> toe> joint on toe and so on.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 3138 views
  • 2 likes
  • 4 in conversation