# Extract digits

I want to extract digits from a string

For example a typical string is given as  =  PYRIDOSTIGMINE BROMIDE (NDA #020414)

I want to extract 020414.

I am having hard time telling SAS (pearl functions) to extract after # and stop before ')'

Any ideas?

Thanks

‎06-11-2014 12:57 PM
## Re: Extract digits

data _null_;

retain re;

if _N_ = 1 then

re = prxparse("/#(\d+)\)/");

input str \$ 1-40;

if prxmatch(re, str) then do;

num = prxposn(re, 1, str);

end;

put num=;

datalines;

CLARITIN-D 24 HOUR (NDA #020470)

PYRIDOSTIGMINE BROMIDE (NDA #020414)

run;

## Re: Extract digits

```data _null_ ;
a='PYRIDOSTIGMINE BROMIDE (NDA #020414)';
b=compress(a, ,'kd');
put a= b=;
run;

```

Xia Keshan

## Re: Extract digits

Xia,

Thanks but it does not give me exact solution. For example I have a string  "CLARITIN-D 24 HOUR (NDA #020470)" And I am looking to extract 020470. But your solution extracts - 24020470.

So I want to use # to signal start of the number and ')' to signal end of the number. Any thoughts on this?

Kiran

## Re: Extract digits

Use the scan function if you have a consistent structure to the data.

sample=scan(word, 3, "()#");

SAS(R) 9.2 Language Reference: Dictionary, Fourth Edition

## Re: Extract digits

Thanks Reeza,

although the data structure is almost standard, sometimes there are multiple open and close brackets.

So my best bet is to recognize  symbol '#'. Is there a way to recognize that?

Thanks,

## Re: Extract digits

Use the scan function with # only as a delimiter and then again with the brackets or some combination thereof.

## Re: Extract digits

data have;

input str \$ 1-40;

datalines;

CLARITIN-D 24 HOUR (NDA #020470)

PYRIDOSTIGMINE BROMIDE (NDA #020414)

;

run;

data want;

set have;

number=scan(str, 2, "#)");

run;

## Re: Extract digits

Reeza -

Number1=scan(str,5,"#(())");

Number2=scan(str,3,"#()");

and then concatenated two columns.

It worked.

‎06-11-2014 12:57 PM
## Re: Extract digits

data _null_;

retain re;

if _N_ = 1 then

re = prxparse("/#(\d+)\)/");

input str \$ 1-40;

if prxmatch(re, str) then do;

num = prxposn(re, 1, str);

end;

put num=;

datalines;

CLARITIN-D 24 HOUR (NDA #020470)

PYRIDOSTIGMINE BROMIDE (NDA #020414)

run;

Thanks

## Re: Extract digits

If using PRXCHANGE, the code can be less verbose:

data want;

input str \$ 1-40;

num=prxchange('s/.+#(\d+).+/\$1/io',-1,str);

cards;

CLARITIN-D 24 HOUR (NDA #020470)

PYRIDOSTIGMINE BROMIDE (NDA #020414)

run;

Haikuo

