Hi guys,
is there a way to replace, in a set of character variables (totally 120) all the letters with a number?
Let's say:
Suppose to have the following:
data have;
input Index1 $ Index2 $ ;
cards;
765 N543
1 56
A654 65498
F567 8765
V51 V55
66 .
065 .
445 .
;
run;
The desired output should be:
Index1 | Index2 |
765 | 99999543 |
1 | 56 |
99999654 | 65498 |
99999567 | 8765 |
9999951 | 9999955 |
66 | |
065 | |
445 |
Thank you in advance!
Still experimenting with ICD9 encodings?
Make a lookup table of all possible ICD9 codes (text variable) and then just number them.
There are source files from CMS online of official ICD9 CM codes.
But the reality is that if you are using actual medical coding results from a hospital or other real world source they will have made their own extensions to what are allowed, so be prepared to extend the list or make decisions about what to do with the extra codes you will see. For example there might not be an XXX.0 code in the official list, but some hospital might have decided to use that a generic catch all for the XXX code where the sub category was specified. (Or the .0 suffix might be missing as your other questions on ICD9 codes have shown.)
Once you have the lookup table matching a number to every possible ICD9 code then you can do a merge and/or join. Or just create an INFORMAT to convert the text actual ICD9 code into the number. (And also make a FORMAT to convert your invented number back into the official ICD9 code as humans would like to see it displayed.)
Let's assume ICD9 codes are all 7 characters long (including the period). Then we can read in the ICD9 code from our list of valid codes and generate a number for the ones that do not have any letters. And then just use a retained NEXTCODE variable to generate a code for those. Remember where you stopped numbering for when you find new ICD9 codes you need to support.
Demo. First make the list of actual ICD9 codes (either from CMS or derived from your real data).
data icd9_raw;
input icd9 $7. ;
cards;
765.
N54.3
001.
056.
654.98
876.5
V51.
V59.
;
proc sort;
by icd9;
run;
Then make a dataset to store the next code to use for ICD9 codes that cannot be converted into a number.
data nextcode;
nextcode=10**6+1;
run;
Now combine the two and make your lookup table.
data icd9(drop=nextcode) nextcode(keep=nextcode);
if _n_=1 then set nextcode;
if eof then output nextcode;
set icd9_raw end=eof;
length icd9nod $6. ;
icd9nod = compress(icd9,'.');
code = input(icd9nod,??bz6.);
if missing(code) then do;
code=nextcode;
nextcode+1;
end;
output icd9;
run;
Results:
Now you can make an informat that converts ICD9 codes into your numeric code.
data formats;
fmtname='ICD9IF';
type='I';
set icd9 (keep=icd9 code);
rename icd9=start code=label;
run;
proc format cntlin=formats;
run;
data test;
set icd9_raw;
code = input(icd9,icd9if.);
run;
Or a FORMAT that converts CODE into the pretty ICD9 string.
With this test data:
data have;
input Index1 $ Index2 $ ;
cards;
765 N543
1 56
A654 65498
F567 8765
V51 V55
66 .
065 .
445 .
ABC EFG
;
run;
proc print data=have;
run;
something like this:
data want;
length Index1 Index2 /* -Index120 */ $ 50; /* if you want to repalace a letter with 99999 make the variables "wide" enough */
set have;
array I Index:;
do over I;
I = prxchange('s/[[:alpha:]]{1}?/99999/i', -1, I);
end;
run;
proc print data=want;
run;
should do the job.
Remember to replace:
array I Index:;
with:
array I <your-list-of-variables>;
Bart
Still experimenting with ICD9 encodings?
Make a lookup table of all possible ICD9 codes (text variable) and then just number them.
There are source files from CMS online of official ICD9 CM codes.
But the reality is that if you are using actual medical coding results from a hospital or other real world source they will have made their own extensions to what are allowed, so be prepared to extend the list or make decisions about what to do with the extra codes you will see. For example there might not be an XXX.0 code in the official list, but some hospital might have decided to use that a generic catch all for the XXX code where the sub category was specified. (Or the .0 suffix might be missing as your other questions on ICD9 codes have shown.)
Once you have the lookup table matching a number to every possible ICD9 code then you can do a merge and/or join. Or just create an INFORMAT to convert the text actual ICD9 code into the number. (And also make a FORMAT to convert your invented number back into the official ICD9 code as humans would like to see it displayed.)
Let's assume ICD9 codes are all 7 characters long (including the period). Then we can read in the ICD9 code from our list of valid codes and generate a number for the ones that do not have any letters. And then just use a retained NEXTCODE variable to generate a code for those. Remember where you stopped numbering for when you find new ICD9 codes you need to support.
Demo. First make the list of actual ICD9 codes (either from CMS or derived from your real data).
data icd9_raw;
input icd9 $7. ;
cards;
765.
N54.3
001.
056.
654.98
876.5
V51.
V59.
;
proc sort;
by icd9;
run;
Then make a dataset to store the next code to use for ICD9 codes that cannot be converted into a number.
data nextcode;
nextcode=10**6+1;
run;
Now combine the two and make your lookup table.
data icd9(drop=nextcode) nextcode(keep=nextcode);
if _n_=1 then set nextcode;
if eof then output nextcode;
set icd9_raw end=eof;
length icd9nod $6. ;
icd9nod = compress(icd9,'.');
code = input(icd9nod,??bz6.);
if missing(code) then do;
code=nextcode;
nextcode+1;
end;
output icd9;
run;
Results:
Now you can make an informat that converts ICD9 codes into your numeric code.
data formats;
fmtname='ICD9IF';
type='I';
set icd9 (keep=icd9 code);
rename icd9=start code=label;
run;
proc format cntlin=formats;
run;
data test;
set icd9_raw;
code = input(icd9,icd9if.);
run;
Or a FORMAT that converts CODE into the pretty ICD9 string.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.