DATA Step, Macro, Functions and more

Replacing Alpha Numeric with Numeric in a string

Reply
Highlighted
Occasional Contributor
Posts: 12

Replacing Alpha Numeric with Numeric in a string

Hi,

I have a field ID coming from client that has a mix of character and numeric.  I need to convert those IDs to all numeric by replacing say A with 01, B with 02 etc...  How do I go about doing that?

 

Sample data received

012345A

25647

88764B

990123C

883516S

996324

 

I want to replace A in the first ob with 01, and B in the third ob with 02 etc.

 

I don't want to totally create a new ID because I need to join this table to other table using this ID and I am afraid if I create random ID, I will mess up the IDs across table.

 

Any help/suggestions would be much appreciated.

Thank you

Super User
Posts: 23,776

Re: Replacing Alpha Numeric with Numeric in a string

You want the TRANSLATE function. Check the documentation but it's exactly what you're looking for here.

 


@devon59 wrote:

Hi,

I have a field ID coming from client that has a mix of character and numeric.  I need to convert those IDs to all numeric by replacing say A with 01, B with 02 etc...  How do I go about doing that?

 

Sample data received

012345A

25647

88764B

990123C

883516S

996324

 

I want to replace A in the first ob with 01, and B in the third ob with 02 etc.

 

I don't want to totally create a new ID because I need to join this table to other table using this ID and I am afraid if I create random ID, I will mess up the IDs across table.

 

Any help/suggestions would be much appreciated.

Thank you


 

Super User
Posts: 23,776

Re: Replacing Alpha Numeric with Numeric in a string

This may also be helpful, I use this approach to create a random key, which uses a format, so it's easy to recode in all tables if needed.
https://gist.github.com/statgeek/fd94b0b6e78815430c1340e8c19f8644
PROC Star
Posts: 1,836

Re: Replacing Alpha Numeric with Numeric in a string

data have;
input sample $10.;
cards;
012345A
25647
88764B
990123C
883516S
996324
;
proc format;
value $l
'A' = 01
'B' = 02
'C' = 03
'S' =04
;

data want;
set have;
if anyalpha(sample)>0 then substr(sample,length(sample))=put(compress(sample,,'ka'),$l.);
run;
PROC Star
Posts: 1,836

Re: Replacing Alpha Numeric with Numeric in a string

Posted in reply to novinosrin

The above assumes your A, B ....Z occur as last value in the string as your sample suggests

Respected Advisor
Posts: 4,736

Re: Replacing Alpha Numeric with Numeric in a string

@devon59

Here you go. I'd be very careful when storing the translated string into a numerical variable unless you're 100% sure that the translated string will never have more than 15 digits (see what happens with the last to rows in below sample code).

data test;
  infile datalines truncover;
  input have $10.;
  length wantC $20.;
  format wantN best32.;
  wantC=upcase(have);
  _pos=anyalpha(wantC);
  do while(_pos ne 0);
    _CtoN=put(sum(rank(substrn(wantC,_pos,1) ),-rank('A'),1),z2.);
    wantC=substrn(wantC,1,_pos-1)||_CtoN||substrn(wantC,_pos+1);
    _pos=anyalpha(wantC);
  end;
  if lengthn(left(wantc))>15 then 
    do;
      put "Strings with more than 15 digits can eventually not be stored with full numerical precision";
      put "Collision risk! Different source ID's can result in common target ID";
    end;
  wantN=input(wantC,best32.);
  datalines;
C012345A
a25d647
88764B
990123C
883516S
996324
AAAAAAAAAA
AAAAAAAAAB
;
run;

 

Trusted Advisor
Posts: 1,256

Re: Replacing Alpha Numeric with Numeric in a string

Isn't there a risk that different IDs will be mapped to the same new ID?

Example: '256401' might already exist and '2564A' would be mapped to it as well.

Occasional Contributor
Posts: 12

Re: Replacing Alpha Numeric with Numeric in a string

Posted in reply to FreelanceReinhard

You have a good point there.  I forgot to think about that.  Would you have any recommendations on how I should go about doing this?

 

Thank you

Trusted Advisor
Posts: 1,256

Re: Replacing Alpha Numeric with Numeric in a string

[ Edited ]

First of all, I'm wondering what you expect from the new ID. Is there anything important that you cannot do with the existing ID?

 

As you say, these IDs are "coming from client". So, if there were any discussions about, say, data issues, you'd always have to refer to their IDs and not to yours. Also, updates of the data would contain the original type of IDs again and again and your conversion would have to work on these unknown future IDs as well.

 

If you really want to do it, you need a conversion which is guaranteed to be 1:1. I think a translation table, i.e. a dataset containing old and new IDs as its two columns, would be the safest way to handle this. You could easily check it for duplicates, amend it with future IDs, use it for the conversion (in either direction) by match-merging, possibly create a SAS (in)format for convenient in-line conversions etc.

 

 

Ask a Question
Discussion stats
  • 8 replies
  • 98 views
  • 0 likes
  • 5 in conversation