BookmarkSubscribeRSS Feed
devon59
Calcite | Level 5

Hi,

I have a field ID coming from client that has a mix of character and numeric.  I need to convert those IDs to all numeric by replacing say A with 01, B with 02 etc...  How do I go about doing that?

 

Sample data received

012345A

25647

88764B

990123C

883516S

996324

 

I want to replace A in the first ob with 01, and B in the third ob with 02 etc.

 

I don't want to totally create a new ID because I need to join this table to other table using this ID and I am afraid if I create random ID, I will mess up the IDs across table.

 

Any help/suggestions would be much appreciated.

Thank you

8 REPLIES 8
Reeza
Super User

You want the TRANSLATE function. Check the documentation but it's exactly what you're looking for here.

 


@devon59 wrote:

Hi,

I have a field ID coming from client that has a mix of character and numeric.  I need to convert those IDs to all numeric by replacing say A with 01, B with 02 etc...  How do I go about doing that?

 

Sample data received

012345A

25647

88764B

990123C

883516S

996324

 

I want to replace A in the first ob with 01, and B in the third ob with 02 etc.

 

I don't want to totally create a new ID because I need to join this table to other table using this ID and I am afraid if I create random ID, I will mess up the IDs across table.

 

Any help/suggestions would be much appreciated.

Thank you


 

Reeza
Super User
This may also be helpful, I use this approach to create a random key, which uses a format, so it's easy to recode in all tables if needed.
https://gist.github.com/statgeek/fd94b0b6e78815430c1340e8c19f8644
novinosrin
Tourmaline | Level 20
data have;
input sample $10.;
cards;
012345A
25647
88764B
990123C
883516S
996324
;
proc format;
value $l
'A' = 01
'B' = 02
'C' = 03
'S' =04
;

data want;
set have;
if anyalpha(sample)>0 then substr(sample,length(sample))=put(compress(sample,,'ka'),$l.);
run;
novinosrin
Tourmaline | Level 20

The above assumes your A, B ....Z occur as last value in the string as your sample suggests

Patrick
Opal | Level 21

@devon59

Here you go. I'd be very careful when storing the translated string into a numerical variable unless you're 100% sure that the translated string will never have more than 15 digits (see what happens with the last to rows in below sample code).

data test;
  infile datalines truncover;
  input have $10.;
  length wantC $20.;
  format wantN best32.;
  wantC=upcase(have);
  _pos=anyalpha(wantC);
  do while(_pos ne 0);
    _CtoN=put(sum(rank(substrn(wantC,_pos,1) ),-rank('A'),1),z2.);
    wantC=substrn(wantC,1,_pos-1)||_CtoN||substrn(wantC,_pos+1);
    _pos=anyalpha(wantC);
  end;
  if lengthn(left(wantc))>15 then 
    do;
      put "Strings with more than 15 digits can eventually not be stored with full numerical precision";
      put "Collision risk! Different source ID's can result in common target ID";
    end;
  wantN=input(wantC,best32.);
  datalines;
C012345A
a25d647
88764B
990123C
883516S
996324
AAAAAAAAAA
AAAAAAAAAB
;
run;

 

FreelanceReinh
Jade | Level 19

Isn't there a risk that different IDs will be mapped to the same new ID?

Example: '256401' might already exist and '2564A' would be mapped to it as well.

devon59
Calcite | Level 5

You have a good point there.  I forgot to think about that.  Would you have any recommendations on how I should go about doing this?

 

Thank you

FreelanceReinh
Jade | Level 19

First of all, I'm wondering what you expect from the new ID. Is there anything important that you cannot do with the existing ID?

 

As you say, these IDs are "coming from client". So, if there were any discussions about, say, data issues, you'd always have to refer to their IDs and not to yours. Also, updates of the data would contain the original type of IDs again and again and your conversion would have to work on these unknown future IDs as well.

 

If you really want to do it, you need a conversion which is guaranteed to be 1:1. I think a translation table, i.e. a dataset containing old and new IDs as its two columns, would be the safest way to handle this. You could easily check it for duplicates, amend it with future IDs, use it for the conversion (in either direction) by match-merging, possibly create a SAS (in)format for convenient in-line conversions etc.

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1261 views
  • 1 like
  • 5 in conversation