BookmarkSubscribeRSS Feed
dCone
Calcite | Level 5

Hello SAS experts..

 

I have searched google for an answer to this question, but no luck..

 

I want to find the first number in a string that has more than 4 digits.

 

I have this data

 

data have;
string="fhwfwoeuh, 12314, fjweipfjwp 214155";output;
string="Hello, 112345 Joifhew, 12-08-2017";output;
string="12343 dfnweon 12.08.17";output;
run;

and I want this

 

 

data want;
firstnumber="12314";output;
firstnumber="112345";output;
firstnumber="12343";output;
run;

Thank you in advance

 

4 REPLIES 4
ChrisBrooks
Ammonite | Level 13

You can do this with a regular expression

 

data have;
string="fhwfwoeuh, 12314, fjweipfjwp 214155";output;
string="Hello, 112345 Joifhew, 12-08-2017";output;
string="12343 dfnweon 12.08.17";output;
string="978y 12343 dfnweon 12.08.17";output;
run;

data want(keep=string firstnumber);
	set have;
	if _n_=1 then prx=prxparse("/\d{5,}/");
	call prxsubstr(prx,string,start,length);
	if start > 0 then do;
		firstnumber=substr(string,start,length);
		output;
	end;
	retain prx;
run;

I've added an extra record where the first occurrence is only 3 digits so it picks up the second occurrence in that string for testing.

 

The regular expression \d{5,} looks for at least five digits and call prxsubstr is used to extract those cases where a match is found

RW9
Diamond | Level 26 RW9
Diamond | Level 26

You can also do the same in pure SAS Base quite simply, by using a small trick:

data want;
  set have;
  length firstnumber $20;
  firstnumber=scan(compress(string," ,","kd"),1," ,");
run;

What I do here is to compress the string, removing all charcters except digits, space and comma (the k means keep rather than drop).  Thus I am left with a string with the numbers separated by commas or spaces.  Then I scan for the first one.

ChrisBrooks
Ammonite | Level 13

Nice idea @RW9 but wouldn't it give an incorrect answer for the final record in this data set?

 

 

data have;
string="fhwfwoeuh, 12314, fjweipfjwp 214155";output;
string="Hello, 112345 Joifhew, 12-08-2017";output;
string="12343 dfnweon 12.08.17";output;
string="978y 12 dfnweon 12.08.17";output;
run;

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yes, he would need to add any rule on top which is needed, in the case of the data you give:

data want;
  set have;
  length firstnumber $20;
  firstnumber=scan(compress(string," ,","kd"),1," ,");
  if lengthn(firstnumber) < 4 then firstnumber="";
run;

That would fix it, however if you need to take the date as well, or further conditions, then maybe scanning over each delimited word from the compress is the way to go:

data want (drop=temp i);
  set have;
  length firstnumber temp $200;
  temp=compress(string," ,","kd");
  do i=1 to countw(temp," ,");
    if lengthn(scan(temp,1," ,")) > =4 and firstnumber="" then firstnumber=scan(temp,1," ,");
  end;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 5380 views
  • 2 likes
  • 3 in conversation