DATA Step, Macro, Functions and more

Extract digits

Accepted Solution Solved
Reply
Contributor
Posts: 63
Accepted Solution

Extract digits

I want to extract digits from a string

For example a typical string is given as  =  PYRIDOSTIGMINE BROMIDE (NDA #020414)

I want to extract 020414.

I am having hard time telling SAS (pearl functions) to extract after # and stop before ')'

Any ideas?

Thanks


Accepted Solutions
Solution
‎06-11-2014 12:57 PM
Super Contributor
Posts: 394

Re: Extract digits

Posted in reply to buckeyefisher

data _null_;

retain re;

if _N_ = 1 then

  re = prxparse("/#(\d+)\)/");

input str $ 1-40;

if prxmatch(re, str) then do;

  num = prxposn(re, 1, str);

end;

put num=;

datalines;

CLARITIN-D 24 HOUR (NDA #020470)

PYRIDOSTIGMINE BROMIDE (NDA #020414)

run;

View solution in original post


All Replies
Super User
Posts: 10,046

Re: Extract digits

Posted in reply to buckeyefisher
data _null_ ;
a='PYRIDOSTIGMINE BROMIDE (NDA #020414)';
b=compress(a, ,'kd');
put a= b=;
run;

Xia Keshan

Contributor
Posts: 63

Re: Extract digits

Xia,

Thanks but it does not give me exact solution. For example I have a string  "CLARITIN-D 24 HOUR (NDA #020470)" And I am looking to extract 020470. But your solution extracts - 24020470.

So I want to use # to signal start of the number and ')' to signal end of the number. Any thoughts on this?

Kiran

Super User
Posts: 19,873

Re: Extract digits

Posted in reply to buckeyefisher

Use the scan function if you have a consistent structure to the data.

sample=scan(word, 3, "()#");

SAS(R) 9.2 Language Reference: Dictionary, Fourth Edition

Contributor
Posts: 63

Re: Extract digits

Thanks Reeza,

although the data structure is almost standard, sometimes there are multiple open and close brackets.

So my best bet is to recognize  symbol '#'. Is there a way to recognize that?

Thanks,

Super User
Posts: 19,873

Re: Extract digits

Posted in reply to buckeyefisher

Use the scan function with # only as a delimiter and then again with the brackets or some combination thereof.

Super User
Posts: 19,873

Re: Extract digits

Posted in reply to buckeyefisher

data have;

input str $ 1-40;

datalines;

CLARITIN-D 24 HOUR (NDA #020470)

PYRIDOSTIGMINE BROMIDE (NDA #020414)

;

run;

data want;

    set have;

    number=scan(str, 2, "#)");

run;

Contributor
Posts: 63

Re: Extract digits

Reeza -

based on your previous suggestion I made it work

Number1=scan(str,5,"#(())");

Number2=scan(str,3,"#()");

and then concatenated two columns.

It worked.

Solution
‎06-11-2014 12:57 PM
Super Contributor
Posts: 394

Re: Extract digits

Posted in reply to buckeyefisher

data _null_;

retain re;

if _N_ = 1 then

  re = prxparse("/#(\d+)\)/");

input str $ 1-40;

if prxmatch(re, str) then do;

  num = prxposn(re, 1, str);

end;

put num=;

datalines;

CLARITIN-D 24 HOUR (NDA #020470)

PYRIDOSTIGMINE BROMIDE (NDA #020414)

run;

Contributor
Posts: 63

Re: Extract digits

Posted in reply to buckeyefisher

Time@SAS

your syntax works perfectly !!!!

Thanks

Respected Advisor
Posts: 3,156

Re: Extract digits

Posted in reply to buckeyefisher

If using PRXCHANGE, the code can be less verbose:

data want;

     input str $ 1-40;

     num=prxchange('s/.+#(\d+).+/$1/io',-1,str);

     cards;

CLARITIN-D 24 HOUR (NDA #020470)

PYRIDOSTIGMINE BROMIDE (NDA #020414)

run;

Haikuo

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 365 views
  • 6 likes
  • 5 in conversation