BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rrevans
Calcite | Level 5

I have a fairly large data set of roughly 350,000 medical record numbers that were sent from a hospital database in the format of "X-#########" where x is the initial of the hospital and the digits are the actual MRN numbers. How do i completely eliminate the initial and hyphen for each observation in that column, leaving only the MRN?

e.g.  H-364803055 --> 364803055

      

Thank You

1 ACCEPTED SOLUTION

Accepted Solutions
DBailey
Lapis Lazuli | Level 10

data have;

length full_mrn $20;

input full_mrn $;

datalines;

H-364803055

run;

/*if you want character*/

data want;

set have;

mrn=substr(full_mrn,indexc(full_mrn,"-")+1);

run;

/*if you want numeric*/

data want2;

set have;

mrn=input(substr(full_mrn,indexc(full_mrn,"-")+1),20.);

run;

View solution in original post

7 REPLIES 7
DBailey
Lapis Lazuli | Level 10

data have;

length full_mrn $20;

input full_mrn $;

datalines;

H-364803055

run;

/*if you want character*/

data want;

set have;

mrn=substr(full_mrn,indexc(full_mrn,"-")+1);

run;

/*if you want numeric*/

data want2;

set have;

mrn=input(substr(full_mrn,indexc(full_mrn,"-")+1),20.);

run;

rrevans
Calcite | Level 5

Thank you, this works:

data want2;

set have;

mrn=input(substr(mrn, indexc(mrn, "-") +1), 20.);

run;

One issue though, var does not convert to numeric. I get this kind of note:

NOTE: Numeric values have been converted to character values at the places given by:

           (LINE):(COLUMN).

           224:19

Any quick fix? I am going to be concatenating this list with another where the mrn is numeric.

DBailey
Lapis Lazuli | Level 10

You can't redefine mrn.  Create a different variable to hold the numeric value:

mrn_num=input(substr(mrn, indexc(mrn, "-") +1), 20.);

Haikuo
Onyx | Level 15

There are many ways to get what you want, following are just a few examples:

data test;

  var='H-364803055';

  var1=compress(var,,'kd');

  var2=substr(var,3);

  var3=scan(var,2);

  var4=prxchange("s/^.*-(\d+)/$1/o",-1,var);

run;

Haikuo

Patrick
Opal | Level 21

@Hai.kuo

If this is only about replacing characters then why formulate a RegEx for the whole string with a capture buffer instead of simply formulate a RegEx for the stuff you don't want and then replace it with a NULL string? ...Being a bit on a mission here because I've seen PGStats doing the same as well.

var5=prxchange("s/[^\d]//o",-1,var);

Haikuo
Onyx | Level 15

Good Point, Patrick. Can't speak for PG, but for me is nothing more than a habit. Always good to have another way to think.

Haikuo

SandyH
Calcite | Level 5

Why not scan(mm,1,'-')?

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 3004 views
  • 3 likes
  • 5 in conversation