New User
Posts: 1

# Find first number in string with more than 4 digits

Hello SAS experts..

I have searched google for an answer to this question, but no luck..

I want to find the first number in a string that has more than 4 digits.

I have this data

``````data have;
string="fhwfwoeuh, 12314, fjweipfjwp 214155";output;
string="Hello, 112345 Joifhew, 12-08-2017";output;
string="12343 dfnweon 12.08.17";output;
run;

``````

and I want this

``````data want;
firstnumber="12314";output;
firstnumber="112345";output;
firstnumber="12343";output;
run;``````

Valued Guide
Posts: 596

## Re: Find first number in string with more than 4 digits

[ Edited ]

You can do this with a regular expression

``````data have;
string="fhwfwoeuh, 12314, fjweipfjwp 214155";output;
string="Hello, 112345 Joifhew, 12-08-2017";output;
string="12343 dfnweon 12.08.17";output;
string="978y 12343 dfnweon 12.08.17";output;
run;

data want(keep=string firstnumber);
set have;
if _n_=1 then prx=prxparse("/\d{5,}/");
call prxsubstr(prx,string,start,length);
if start > 0 then do;
firstnumber=substr(string,start,length);
output;
end;
retain prx;
run;``````

I've added an extra record where the first occurrence is only 3 digits so it picks up the second occurrence in that string for testing.

The regular expression \d{5,} looks for at least five digits and call prxsubstr is used to extract those cases where a match is found

Super User
Posts: 9,789

## Re: Find first number in string with more than 4 digits

You can also do the same in pure SAS Base quite simply, by using a small trick:

```data want;
set have;
length firstnumber \$20;
firstnumber=scan(compress(string," ,","kd"),1," ,");
run;```

What I do here is to compress the string, removing all charcters except digits, space and comma (the k means keep rather than drop).  Thus I am left with a string with the numbers separated by commas or spaces.  Then I scan for the first one.

Valued Guide
Posts: 596

## Re: Find first number in string with more than 4 digits

Nice idea @RW9 but wouldn't it give an incorrect answer for the final record in this data set?

``````data have;
string="fhwfwoeuh, 12314, fjweipfjwp 214155";output;
string="Hello, 112345 Joifhew, 12-08-2017";output;
string="12343 dfnweon 12.08.17";output;
string="978y 12 dfnweon 12.08.17";output;
run;``````

Super User
Posts: 9,789

## Re: Find first number in string with more than 4 digits

Yes, he would need to add any rule on top which is needed, in the case of the data you give:

```data want;
set have;
length firstnumber \$20;
firstnumber=scan(compress(string," ,","kd"),1," ,");
if lengthn(firstnumber) < 4 then firstnumber="";
run;```

That would fix it, however if you need to take the date as well, or further conditions, then maybe scanning over each delimited word from the compress is the way to go:

```data want (drop=temp i);
set have;
length firstnumber temp \$200;
temp=compress(string," ,","kd");
do i=1 to countw(temp," ,");
if lengthn(scan(temp,1," ,")) > =4 and firstnumber="" then firstnumber=scan(temp,1," ,");
end;
run;```
Discussion stats
• 4 replies
• 355 views
• 2 likes
• 3 in conversation