What I am trying to do, is use a wildcard for character variables (in this case, ICD-9 codes) within proc format.
To keep things simple (because a lot of ICD coding has a lot of suffixes and we only care for the primary prefix), I am using coding for uncomplicated hypertension (ICD-9 code 401), which has 2 subcodes (4011, 4019). I do not want to add a format definition for 4011 and 4019 (this becomes much more arduous when dealing with many other ICD-9 codes). I simply want to create a format by the prefix, and everything after the prefix will be assigned the same format definition.
The prefix code I am interested in, in this case, is 401 (as a character).
I have tried using a colon
"401:" = "Hypertension Uncomplicated - ICD-9"
in proc format (in an attempt to include 4011 and 4019), to no avail.
Here is my code:
proc format;
value $test
"401" = "Hypertension Uncomplicated - ICD-9"
;
run;
quit;
data test_form;
input PatID ICD9$;
datalines;
1 401
2 4011
3 4019
;
data test_form;
set test_form;
format ICD9 $test.;
run;
proc print data = test_form NOOBS;
title "Testing format wildcards";
run;
title;
As for outputs:
Current Output: | |
PatID | ICD9 |
1 | Hypertension Uncomplicated - ICD-9 |
2 | 4011 |
3 | 4019 |
Desired Output: | |
PatID | ICD9 |
1 | Hypertension Uncomplicated - ICD-9 |
2 | Hypertension Uncomplicated - ICD-9 |
3 | Hypertension Uncomplicated - ICD-9 |
You can try ranges.
'401' - '4019' = '....'
So that will match any string that sorts between the two values. Like '4010','4012345678' etc.
Or you could try regular expressions, but I think you will need to switch to using an INFORMAT instead of a FORMAT. At least that is what it looks like on the documentation page.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.