- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have a fairly large data set of roughly 350,000 medical record numbers that were sent from a hospital database in the format of "X-#########" where x is the initial of the hospital and the digits are the actual MRN numbers. How do i completely eliminate the initial and hyphen for each observation in that column, leaving only the MRN?
e.g. H-364803055 --> 364803055
Thank You
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
length full_mrn $20;
input full_mrn $;
datalines;
H-364803055
run;
/*if you want character*/
data want;
set have;
mrn=substr(full_mrn,indexc(full_mrn,"-")+1);
run;
/*if you want numeric*/
data want2;
set have;
mrn=input(substr(full_mrn,indexc(full_mrn,"-")+1),20.);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
length full_mrn $20;
input full_mrn $;
datalines;
H-364803055
run;
/*if you want character*/
data want;
set have;
mrn=substr(full_mrn,indexc(full_mrn,"-")+1);
run;
/*if you want numeric*/
data want2;
set have;
mrn=input(substr(full_mrn,indexc(full_mrn,"-")+1),20.);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you, this works:
data want2;
set have;
mrn=input(substr(mrn, indexc(mrn, "-") +1), 20.);
run;
One issue though, var does not convert to numeric. I get this kind of note:
NOTE: Numeric values have been converted to character values at the places given by:
(LINE):(COLUMN).
224:19
Any quick fix? I am going to be concatenating this list with another where the mrn is numeric.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can't redefine mrn. Create a different variable to hold the numeric value:
mrn_num=input(substr(mrn, indexc(mrn, "-") +1), 20.);
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
There are many ways to get what you want, following are just a few examples:
data test;
var='H-364803055';
var1=compress(var,,'kd');
var2=substr(var,3);
var3=scan(var,2);
var4=prxchange("s/^.*-(\d+)/$1/o",-1,var);
run;
Haikuo
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Hai.kuo
If this is only about replacing characters then why formulate a RegEx for the whole string with a capture buffer instead of simply formulate a RegEx for the stuff you don't want and then replace it with a NULL string? ...Being a bit on a mission here because I've seen PGStats doing the same as well.
var5=prxchange("s/[^\d]//o",-1,var);
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Good Point, Patrick. Can't speak for PG, but for me is nothing more than a habit. Always good to have another way to think.
Haikuo
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Why not scan(mm,1,'-')?