I have a fairly large data set of roughly 350,000 medical record numbers that were sent from a hospital database in the format of "X-#########" where x is the initial of the hospital and the digits are the actual MRN numbers. How do i completely eliminate the initial and hyphen for each observation in that column, leaving only the MRN?
e.g. H-364803055 --> 364803055
Thank You
data have;
length full_mrn $20;
input full_mrn $;
datalines;
H-364803055
run;
/*if you want character*/
data want;
set have;
mrn=substr(full_mrn,indexc(full_mrn,"-")+1);
run;
/*if you want numeric*/
data want2;
set have;
mrn=input(substr(full_mrn,indexc(full_mrn,"-")+1),20.);
run;
data have;
length full_mrn $20;
input full_mrn $;
datalines;
H-364803055
run;
/*if you want character*/
data want;
set have;
mrn=substr(full_mrn,indexc(full_mrn,"-")+1);
run;
/*if you want numeric*/
data want2;
set have;
mrn=input(substr(full_mrn,indexc(full_mrn,"-")+1),20.);
run;
Thank you, this works:
data want2;
set have;
mrn=input(substr(mrn, indexc(mrn, "-") +1), 20.);
run;
One issue though, var does not convert to numeric. I get this kind of note:
NOTE: Numeric values have been converted to character values at the places given by:
(LINE):(COLUMN).
224:19
Any quick fix? I am going to be concatenating this list with another where the mrn is numeric.
You can't redefine mrn. Create a different variable to hold the numeric value:
mrn_num=input(substr(mrn, indexc(mrn, "-") +1), 20.);
There are many ways to get what you want, following are just a few examples:
data test;
var='H-364803055';
var1=compress(var,,'kd');
var2=substr(var,3);
var3=scan(var,2);
var4=prxchange("s/^.*-(\d+)/$1/o",-1,var);
run;
Haikuo
@Hai.kuo
If this is only about replacing characters then why formulate a RegEx for the whole string with a capture buffer instead of simply formulate a RegEx for the stuff you don't want and then replace it with a NULL string? ...Being a bit on a mission here because I've seen PGStats doing the same as well.
var5=prxchange("s/[^\d]//o",-1,var);
Good Point, Patrick. Can't speak for PG, but for me is nothing more than a habit. Always good to have another way to think.
Haikuo
Why not scan(mm,1,'-')?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.