I want to extract digits from a string
For example a typical string is given as = PYRIDOSTIGMINE BROMIDE (NDA #020414)
I want to extract 020414.
I am having hard time telling SAS (pearl functions) to extract after # and stop before ')'
Any ideas?
Thanks
data _null_;
retain re;
if _N_ = 1 then
re = prxparse("/#(\d+)\)/");
input str $ 1-40;
if prxmatch(re, str) then do;
num = prxposn(re, 1, str);
end;
put num=;
datalines;
CLARITIN-D 24 HOUR (NDA #020470)
PYRIDOSTIGMINE BROMIDE (NDA #020414)
run;
data _null_ ; a='PYRIDOSTIGMINE BROMIDE (NDA #020414)'; b=compress(a, ,'kd'); put a= b=; run;
Xia Keshan
Xia,
Thanks but it does not give me exact solution. For example I have a string "CLARITIN-D 24 HOUR (NDA #020470)" And I am looking to extract 020470. But your solution extracts - 24020470.
So I want to use # to signal start of the number and ')' to signal end of the number. Any thoughts on this?
Kiran
Use the scan function if you have a consistent structure to the data.
sample=scan(word, 3, "()#");
Thanks Reeza,
although the data structure is almost standard, sometimes there are multiple open and close brackets.
So my best bet is to recognize symbol '#'. Is there a way to recognize that?
Thanks,
Use the scan function with # only as a delimiter and then again with the brackets or some combination thereof.
data have;
input str $ 1-40;
datalines;
CLARITIN-D 24 HOUR (NDA #020470)
PYRIDOSTIGMINE BROMIDE (NDA #020414)
;
run;
data want;
set have;
number=scan(str, 2, "#)");
run;
Reeza -
based on your previous suggestion I made it work
Number1=scan(str,5,"#(())");
Number2=scan(str,3,"#()");
and then concatenated two columns.
It worked.
data _null_;
retain re;
if _N_ = 1 then
re = prxparse("/#(\d+)\)/");
input str $ 1-40;
if prxmatch(re, str) then do;
num = prxposn(re, 1, str);
end;
put num=;
datalines;
CLARITIN-D 24 HOUR (NDA #020470)
PYRIDOSTIGMINE BROMIDE (NDA #020414)
run;
Time@SAS
your syntax works perfectly !!!!
Thanks
If using PRXCHANGE, the code can be less verbose:
data want;
input str $ 1-40;
num=prxchange('s/.+#(\d+).+/$1/io',-1,str);
cards;
CLARITIN-D 24 HOUR (NDA #020470)
PYRIDOSTIGMINE BROMIDE (NDA #020414)
run;
Haikuo
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.