BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
NewUsrStat
Pyrite | Level 9

Hi guys, 

is there a way to replace, in a set of character variables (totally 120) all the letters with a number? 

Let's say: 

Suppose to have the following: 

 

 

 

 

data have;
input Index1 $ Index2 $ ;
cards;
765 N543
1 56
A654 65498
F567 8765
V51 V55
66 .
065 .
445 .
;
run;

 

 

 

 The desired output should be: 

 

Index1 Index2
765 99999543
1 56
99999654 65498
99999567 8765
9999951 9999955
66  
065  
445  

 

Thank you in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Still experimenting with ICD9 encodings?

 

Make a lookup table of all possible ICD9 codes (text variable) and then just number them.

 

There are source files from CMS online of official ICD9 CM codes.

But the reality is that if you are using actual medical coding results from a hospital or other real world source they will have made their own extensions to what are allowed, so be prepared to extend the list or make decisions about what to do with the extra codes you will see.  For example there might not be an XXX.0 code in the official list, but some hospital might have decided to use that a generic catch all for the XXX code where the sub category was specified.  (Or the .0 suffix might be missing as your other questions on ICD9 codes have shown.)

 

Once you have the lookup table matching a number to every possible ICD9 code then you can do a merge and/or join.  Or just create an INFORMAT to convert the text actual ICD9 code into the number.  (And also make a FORMAT to convert your invented number back into the official ICD9 code as humans would like to see it displayed.)

 

Let's assume ICD9 codes are all 7 characters long (including the period).  Then we can read in the ICD9 code from our list of valid codes and generate a number for the ones that do not have any letters.  And then just use a retained NEXTCODE variable to generate a code for those.  Remember where you stopped numbering for when you find new ICD9 codes you need to support.

 

Demo.  First make the list of actual ICD9 codes (either from CMS or derived from your real data).

data icd9_raw;
  input icd9 $7. ;
cards;
765.   
N54.3  
001.   
056.   
654.98 
876.5  
V51.   
V59.   
;

proc sort;
  by icd9;
run;

Then make a dataset to store the next code to use for ICD9 codes that cannot be converted into a number.

data nextcode;
  nextcode=10**6+1;
run;

Now combine the two and make your lookup table.

data icd9(drop=nextcode) nextcode(keep=nextcode);
  if _n_=1 then set nextcode;
  if eof then output nextcode;
  set icd9_raw end=eof;
  length icd9nod $6. ;
  icd9nod = compress(icd9,'.');
  code = input(icd9nod,??bz6.);
  if missing(code) then do;
    code=nextcode;
    nextcode+1;
  end;
  output icd9;
run;

Results:

Tom_0-1709820325353.png

Now you can make an informat that converts ICD9 codes into your numeric code.

data formats;
  fmtname='ICD9IF';
  type='I';
  set icd9 (keep=icd9 code);
  rename icd9=start code=label;
run;

proc format cntlin=formats; 
run;

data test;
  set icd9_raw;
  code = input(icd9,icd9if.);
run;

Or a FORMAT that converts CODE into the pretty ICD9 string.

 

 

 

 

View solution in original post

3 REPLIES 3
yabwon
Onyx | Level 15

With this test data:

data have;
input Index1 $ Index2 $ ;
cards;
765 N543
1 56
A654 65498
F567 8765
V51 V55
66 .
065 .
445 .
ABC EFG
;
run;
proc print data=have;
run;

something like this:

data want;
  length Index1 Index2 /* -Index120 */ $ 50; /* if you want to repalace a letter with 99999 make the variables "wide" enough */

  set have;
  array I Index:;

  do over I;
    I = prxchange('s/[[:alpha:]]{1}?/99999/i', -1, I);
  end;
run;
proc print data=want;
run;

should do the job.

 

Remember to replace:

array I Index:;

with:

array I <your-list-of-variables>;

 

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Tom
Super User Tom
Super User

Still experimenting with ICD9 encodings?

 

Make a lookup table of all possible ICD9 codes (text variable) and then just number them.

 

There are source files from CMS online of official ICD9 CM codes.

But the reality is that if you are using actual medical coding results from a hospital or other real world source they will have made their own extensions to what are allowed, so be prepared to extend the list or make decisions about what to do with the extra codes you will see.  For example there might not be an XXX.0 code in the official list, but some hospital might have decided to use that a generic catch all for the XXX code where the sub category was specified.  (Or the .0 suffix might be missing as your other questions on ICD9 codes have shown.)

 

Once you have the lookup table matching a number to every possible ICD9 code then you can do a merge and/or join.  Or just create an INFORMAT to convert the text actual ICD9 code into the number.  (And also make a FORMAT to convert your invented number back into the official ICD9 code as humans would like to see it displayed.)

 

Let's assume ICD9 codes are all 7 characters long (including the period).  Then we can read in the ICD9 code from our list of valid codes and generate a number for the ones that do not have any letters.  And then just use a retained NEXTCODE variable to generate a code for those.  Remember where you stopped numbering for when you find new ICD9 codes you need to support.

 

Demo.  First make the list of actual ICD9 codes (either from CMS or derived from your real data).

data icd9_raw;
  input icd9 $7. ;
cards;
765.   
N54.3  
001.   
056.   
654.98 
876.5  
V51.   
V59.   
;

proc sort;
  by icd9;
run;

Then make a dataset to store the next code to use for ICD9 codes that cannot be converted into a number.

data nextcode;
  nextcode=10**6+1;
run;

Now combine the two and make your lookup table.

data icd9(drop=nextcode) nextcode(keep=nextcode);
  if _n_=1 then set nextcode;
  if eof then output nextcode;
  set icd9_raw end=eof;
  length icd9nod $6. ;
  icd9nod = compress(icd9,'.');
  code = input(icd9nod,??bz6.);
  if missing(code) then do;
    code=nextcode;
    nextcode+1;
  end;
  output icd9;
run;

Results:

Tom_0-1709820325353.png

Now you can make an informat that converts ICD9 codes into your numeric code.

data formats;
  fmtname='ICD9IF';
  type='I';
  set icd9 (keep=icd9 code);
  rename icd9=start code=label;
run;

proc format cntlin=formats; 
run;

data test;
  set icd9_raw;
  code = input(icd9,icd9if.);
run;

Or a FORMAT that converts CODE into the pretty ICD9 string.

 

 

 

 

NewUsrStat
Pyrite | Level 9
Yes, still dealing with devil. Unfortunately my project to not to convert codes has been rejected. I will try with what you suggest and really really many thanks for that.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 730 views
  • 1 like
  • 3 in conversation