I want to extract digits from a string
For example a typical string is given as = PYRIDOSTIGMINE BROMIDE (NDA #020414)
I want to extract 020414.
I am having hard time telling SAS (pearl functions) to extract after # and stop before ')'
Any ideas?
Thanks
data _null_;
retain re;
if _N_ = 1 then
re = prxparse("/#(\d+)\)/");
input str $ 1-40;
if prxmatch(re, str) then do;
num = prxposn(re, 1, str);
end;
put num=;
datalines;
CLARITIN-D 24 HOUR (NDA #020470)
PYRIDOSTIGMINE BROMIDE (NDA #020414)
run;
data _null_ ; a='PYRIDOSTIGMINE BROMIDE (NDA #020414)'; b=compress(a, ,'kd'); put a= b=; run;
Xia Keshan
Xia,
Thanks but it does not give me exact solution. For example I have a string "CLARITIN-D 24 HOUR (NDA #020470)" And I am looking to extract 020470. But your solution extracts - 24020470.
So I want to use # to signal start of the number and ')' to signal end of the number. Any thoughts on this?
Kiran
Use the scan function if you have a consistent structure to the data.
sample=scan(word, 3, "()#");
Thanks Reeza,
although the data structure is almost standard, sometimes there are multiple open and close brackets.
So my best bet is to recognize symbol '#'. Is there a way to recognize that?
Thanks,
Use the scan function with # only as a delimiter and then again with the brackets or some combination thereof.
data have;
input str $ 1-40;
datalines;
CLARITIN-D 24 HOUR (NDA #020470)
PYRIDOSTIGMINE BROMIDE (NDA #020414)
;
run;
data want;
set have;
number=scan(str, 2, "#)");
run;
Reeza -
based on your previous suggestion I made it work
Number1=scan(str,5,"#(())");
Number2=scan(str,3,"#()");
and then concatenated two columns.
It worked.
data _null_;
retain re;
if _N_ = 1 then
re = prxparse("/#(\d+)\)/");
input str $ 1-40;
if prxmatch(re, str) then do;
num = prxposn(re, 1, str);
end;
put num=;
datalines;
CLARITIN-D 24 HOUR (NDA #020470)
PYRIDOSTIGMINE BROMIDE (NDA #020414)
run;
Time@SAS
your syntax works perfectly !!!!
Thanks
If using PRXCHANGE, the code can be less verbose:
data want;
input str $ 1-40;
num=prxchange('s/.+#(\d+).+/$1/io',-1,str);
cards;
CLARITIN-D 24 HOUR (NDA #020470)
PYRIDOSTIGMINE BROMIDE (NDA #020414)
run;
Haikuo
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.