DATA Step, Macro, Functions and more

Find first number in string with more than 4 digits

Reply
New User
Posts: 1

Find first number in string with more than 4 digits

Hello SAS experts..

 

I have searched google for an answer to this question, but no luck..

 

I want to find the first number in a string that has more than 4 digits.

 

I have this data

 

data have;
string="fhwfwoeuh, 12314, fjweipfjwp 214155";output;
string="Hello, 112345 Joifhew, 12-08-2017";output;
string="12343 dfnweon 12.08.17";output;
run;

and I want this

 

 

data want;
firstnumber="12314";output;
firstnumber="112345";output;
firstnumber="12343";output;
run;

Thank you in advance

 

Valued Guide
Posts: 555

Re: Find first number in string with more than 4 digits

[ Edited ]

You can do this with a regular expression

 

data have;
string="fhwfwoeuh, 12314, fjweipfjwp 214155";output;
string="Hello, 112345 Joifhew, 12-08-2017";output;
string="12343 dfnweon 12.08.17";output;
string="978y 12343 dfnweon 12.08.17";output;
run;

data want(keep=string firstnumber);
	set have;
	if _n_=1 then prx=prxparse("/\d{5,}/");
	call prxsubstr(prx,string,start,length);
	if start > 0 then do;
		firstnumber=substr(string,start,length);
		output;
	end;
	retain prx;
run;

I've added an extra record where the first occurrence is only 3 digits so it picks up the second occurrence in that string for testing.

 

The regular expression \d{5,} looks for at least five digits and call prxsubstr is used to extract those cases where a match is found

Super User
Super User
Posts: 9,193

Re: Find first number in string with more than 4 digits

You can also do the same in pure SAS Base quite simply, by using a small trick:

data want;
  set have;
  length firstnumber $20;
  firstnumber=scan(compress(string," ,","kd"),1," ,");
run;

What I do here is to compress the string, removing all charcters except digits, space and comma (the k means keep rather than drop).  Thus I am left with a string with the numbers separated by commas or spaces.  Then I scan for the first one.

Valued Guide
Posts: 555

Re: Find first number in string with more than 4 digits

Nice idea @RW9 but wouldn't it give an incorrect answer for the final record in this data set?

 

 

data have;
string="fhwfwoeuh, 12314, fjweipfjwp 214155";output;
string="Hello, 112345 Joifhew, 12-08-2017";output;
string="12343 dfnweon 12.08.17";output;
string="978y 12 dfnweon 12.08.17";output;
run;

 

Super User
Super User
Posts: 9,193

Re: Find first number in string with more than 4 digits

Posted in reply to ChrisBrooks

Yes, he would need to add any rule on top which is needed, in the case of the data you give:

data want;
  set have;
  length firstnumber $20;
  firstnumber=scan(compress(string," ,","kd"),1," ,");
  if lengthn(firstnumber) < 4 then firstnumber="";
run;

That would fix it, however if you need to take the date as well, or further conditions, then maybe scanning over each delimited word from the compress is the way to go:

data want (drop=temp i);
  set have;
  length firstnumber temp $200;
  temp=compress(string," ,","kd");
  do i=1 to countw(temp," ,");
    if lengthn(scan(temp,1," ,")) > =4 and firstnumber="" then firstnumber=scan(temp,1," ,");
  end;
run;
Ask a Question
Discussion stats
  • 4 replies
  • 214 views
  • 2 likes
  • 3 in conversation