DATA Step, Macro, Functions and more

Proc Format: ranges with alphanumeric values

Reply
Contributor
Posts: 57

Proc Format: ranges with alphanumeric values

I'm working with CPT codes, which are used for medical procedures.  There are about 23,000 codes that I'm collapsing into about 160 categories.  Because some of the codes use letters, they are stored in a string variable.  I'm using ranges of codes in Proc Format to create the categories, such as

 

'00100'-'00222' = "00100-00222 head"

'10000'-'10022' = "10000-10022 general"

 

The problem is that other codes that belong in other categories use the same number prefixes, but with a letter at the end:

 

'0001F'-'0015F' = "0001F-0015F Composite measures"

'1000F'-'1220F' = "1000F-1220F Patient history"

 

When i run the syntax, I get these types of errors:

 

ERROR: These two ranges overlap: 0001F-0015F and 00100-00222 (fuzz=0).
ERROR: These two ranges overlap: 10000-10022 and 1000F-1220F (fuzz=0).

 

How can I avoid this overlap without having to specify every value in the category?

Respected Advisor
Posts: 2,812

Re: Proc Format: ranges with alphanumeric values

Posted in reply to Wolverine

Do you have the codes and their text equivalent in some SAS dataset (or text file or Excel File)?

--
Paige Miller
Contributor
Posts: 57

Re: Proc Format: ranges with alphanumeric values

Posted in reply to PaigeMiller

I think so, if I'm understanding your question correctly.  The format syntax file was created from an Excel file that has columns like this:

 

Procedure Code Category Short Description
00100 SC - Standard Code ANESTH SALIVARY GLAND
Respected Advisor
Posts: 2,812

Re: Proc Format: ranges with alphanumeric values

Posted in reply to Wolverine

So then it should be relatively easy to use that data set to set up your SAS Formats. In this example code, you would of course have to replace the variable names I am using with the actual variable names.

 

UNTESTED CODE

data _null_;
    set codes_data_set end=eof;
    if _n_=1 then call execute('proc format; value $codef ');
    call execute(cats(procedurecode,'=',shortdescription));
    if eof then call execute('; run;');
run;
--
Paige Miller
Super User
Posts: 23,262

Re: Proc Format: ranges with alphanumeric values

Posted in reply to Wolverine

Or use a CNTLIN data set. You can create a format from a data set using CNTLIN.

 

data myformat;
set yourdata;

start = procedurecode;
label=shortDescription;
fmtname='proccode_fmt';
type='J';

run;

proc format cntlin=myformat;
run;

http://support.sas.com/documentation/cdl/en/proc/65145/HTML/default/viewer.htm#n1e19y6lrektafn1kj6nb...

Contributor
Posts: 57

Re: Proc Format: ranges with alphanumeric values

Those examples of syntax would apply the labels to the CPT codes.  But I've already set that up.  The problem I'm having is with collapsing a specified range of codes into categories without causing an overlap error.

Super User
Posts: 23,262

Re: Proc Format: ranges with alphanumeric values

Posted in reply to Wolverine

Rather than type out the groups then, add them to your Excel file. Then import that as your format.

Either way you still need to list it once manually/semi-manually Smiley Sad

 

 

Super User
Posts: 13,304

Re: Proc Format: ranges with alphanumeric values

Posted in reply to Wolverine

Wolverine wrote:

Those examples of syntax would apply the labels to the CPT codes.  But I've already set that up.  The problem I'm having is with collapsing a specified range of codes into categories without causing an overlap error.


You are somewhat misunderstanding ranges and values, range is basically a short cut for listing values, with the added benefit that with numeric values you get what you want. Character values do not work that way. Please see this short example using one line per value instead of ranges. Note that the format does not generate any errors as works as expected.

proc format library=work;
value $ grouper
'A' = 'Group 1'
'B' = 'Group 2'
'ABA' = 'Group 1'
'CDC' = 'Group 2'
;
run;

data junk;
   input x $;
datalines;
A
A
A
B
ABA 
ABA
CDC
;
run;

proc freq data=junk;
   tables x;
   format x $grouper.;
run;

So having the code and a value means you can use a cntlin data set based on that to create the format(s) you want and will not have and "range overlap errors". Though it may take some time/effort.

 

 

And here's an example of why "ranges" with character values seldom work as desired.

Note that all the values of the variable x are within the specified range:

data junk;
   input x $;
   inrange = '0001F' le x le '0015F' ;
datalines;
0001F
0014F
0014
0001G
0015A
0002
;
run;

Except for the first two values I doubt that is the way you want them compared.

 

 

Of course even more work could be generated by using a custom informat to assign numeric values that would allow uses of ranges but the correspondence to the original codes would be obfuscated and likely difficult to maintain.

Contributor
Posts: 57

Re: Proc Format: ranges with alphanumeric values

It's frustrating because only about 1000 of the 28000 codes contain letters. If it wasn't for those 1000 codes, I could use ranges and it would be a lot easier. Another idea would be to recode into a new variable so that multiple zeroes would replace the letters. The new variable could be numeric, and ranges would work because the multiple zeroes would push the alphanumeric codes out of range. From that point, it would still take some work to convert those codes back to their original form, but at least I'd only have 1000 to deal with instead of 28,000.
Super User
Posts: 23,262

Re: Proc Format: ranges with alphanumeric values

Posted in reply to Wolverine

I would still use the excel approach, adding the categories into the file.

It's easy to filter the data in Excel and recode them quickly with the fill option.

Super User
Posts: 13,304

Re: Proc Format: ranges with alphanumeric values

Posted in reply to Wolverine

Wolverine wrote:
It's frustrating because only about 1000 of the 28000 codes contain letters. If it wasn't for those 1000 codes, I could use ranges and it would be a lot easier. Another idea would be to recode into a new variable so that multiple zeroes would replace the letters. The new variable could be numeric, and ranges would work because the multiple zeroes would push the alphanumeric codes out of range. From that point, it would still take some work to convert those codes back to their original form, but at least I'd only have 1000 to deal with instead of 28,000.

Is every single code exactly 5 characters? Then possibly that is a true statement.

I agree with @Reeza about ways to build lists that will work with proc format. One of the things I routinely do is convert documentation of "acceptable values" for data fields that are supplied for a project and use Excel to help write format strings, labels and sometimes input  and informat statements.

 

Wait until you get to work with ICD-10 codes where the lengths vary a lot as they keep adding "decimal point values" to indicate submembership: foot=> toe> joint on toe and so on.

Ask a Question
Discussion stats
  • 10 replies
  • 159 views
  • 2 likes
  • 4 in conversation