I'm working with CPT codes, which are used for medical procedures. There are about 23,000 codes that I'm collapsing into about 160 categories. Because some of the codes use letters, they are stored in a string variable. I'm using ranges of codes in Proc Format to create the categories, such as
'00100'-'00222' = "00100-00222 head"
'10000'-'10022' = "10000-10022 general"
The problem is that other codes that belong in other categories use the same number prefixes, but with a letter at the end:
'0001F'-'0015F' = "0001F-0015F Composite measures"
'1000F'-'1220F' = "1000F-1220F Patient history"
When i run the syntax, I get these types of errors:
ERROR: These two ranges overlap: 0001F-0015F and 00100-00222 (fuzz=0).
ERROR: These two ranges overlap: 10000-10022 and 1000F-1220F (fuzz=0).
How can I avoid this overlap without having to specify every value in the category?
Do you have the codes and their text equivalent in some SAS dataset (or text file or Excel File)?
I think so, if I'm understanding your question correctly. The format syntax file was created from an Excel file that has columns like this:
| Procedure Code | Category | Short Description | 
| 00100 | SC - Standard Code | ANESTH SALIVARY GLAND | 
So then it should be relatively easy to use that data set to set up your SAS Formats. In this example code, you would of course have to replace the variable names I am using with the actual variable names.
UNTESTED CODE
data _null_;
    set codes_data_set end=eof;
    if _n_=1 then call execute('proc format; value $codef ');
    call execute(cats(procedurecode,'=',shortdescription));
    if eof then call execute('; run;');
run;
					
				
			
			
				Or use a CNTLIN data set. You can create a format from a data set using CNTLIN.
data myformat;
set yourdata;
start = procedurecode;
label=shortDescription;
fmtname='proccode_fmt';
type='J';
run;
proc format cntlin=myformat;
run;
Those examples of syntax would apply the labels to the CPT codes. But I've already set that up. The problem I'm having is with collapsing a specified range of codes into categories without causing an overlap error.
Rather than type out the groups then, add them to your Excel file. Then import that as your format.
Either way you still need to list it once manually/semi-manually 😞
@Wolverine wrote:
Those examples of syntax would apply the labels to the CPT codes. But I've already set that up. The problem I'm having is with collapsing a specified range of codes into categories without causing an overlap error.
You are somewhat misunderstanding ranges and values, range is basically a short cut for listing values, with the added benefit that with numeric values you get what you want. Character values do not work that way. Please see this short example using one line per value instead of ranges. Note that the format does not generate any errors as works as expected.
proc format library=work; value $ grouper 'A' = 'Group 1' 'B' = 'Group 2' 'ABA' = 'Group 1' 'CDC' = 'Group 2' ; run; data junk; input x $; datalines; A A A B ABA ABA CDC ; run; proc freq data=junk; tables x; format x $grouper.; run;
So having the code and a value means you can use a cntlin data set based on that to create the format(s) you want and will not have and "range overlap errors". Though it may take some time/effort.
And here's an example of why "ranges" with character values seldom work as desired.
Note that all the values of the variable x are within the specified range:
data junk; input x $; inrange = '0001F' le x le '0015F' ; datalines; 0001F 0014F 0014 0001G 0015A 0002 ; run;
Except for the first two values I doubt that is the way you want them compared.
Of course even more work could be generated by using a custom informat to assign numeric values that would allow uses of ranges but the correspondence to the original codes would be obfuscated and likely difficult to maintain.
I would still use the excel approach, adding the categories into the file.
It's easy to filter the data in Excel and recode them quickly with the fill option.
@Wolverine wrote:
It's frustrating because only about 1000 of the 28000 codes contain letters. If it wasn't for those 1000 codes, I could use ranges and it would be a lot easier. Another idea would be to recode into a new variable so that multiple zeroes would replace the letters. The new variable could be numeric, and ranges would work because the multiple zeroes would push the alphanumeric codes out of range. From that point, it would still take some work to convert those codes back to their original form, but at least I'd only have 1000 to deal with instead of 28,000.
Is every single code exactly 5 characters? Then possibly that is a true statement.
I agree with @Reeza about ways to build lists that will work with proc format. One of the things I routinely do is convert documentation of "acceptable values" for data fields that are supplied for a project and use Excel to help write format strings, labels and sometimes input and informat statements.
Wait until you get to work with ICD-10 codes where the lengths vary a lot as they keep adding "decimal point values" to indicate submembership: foot=> toe> joint on toe and so on.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
