Hello SAS experts..
I have searched google for an answer to this question, but no luck..
I want to find the first number in a string that has more than 4 digits.
I have this data
data have;
string="fhwfwoeuh, 12314, fjweipfjwp 214155";output;
string="Hello, 112345 Joifhew, 12-08-2017";output;
string="12343 dfnweon 12.08.17";output;
run;
and I want this
data want;
firstnumber="12314";output;
firstnumber="112345";output;
firstnumber="12343";output;
run;
Thank you in advance
You can do this with a regular expression
data have;
string="fhwfwoeuh, 12314, fjweipfjwp 214155";output;
string="Hello, 112345 Joifhew, 12-08-2017";output;
string="12343 dfnweon 12.08.17";output;
string="978y 12343 dfnweon 12.08.17";output;
run;
data want(keep=string firstnumber);
set have;
if _n_=1 then prx=prxparse("/\d{5,}/");
call prxsubstr(prx,string,start,length);
if start > 0 then do;
firstnumber=substr(string,start,length);
output;
end;
retain prx;
run;
I've added an extra record where the first occurrence is only 3 digits so it picks up the second occurrence in that string for testing.
The regular expression \d{5,} looks for at least five digits and call prxsubstr is used to extract those cases where a match is found
You can also do the same in pure SAS Base quite simply, by using a small trick:
data want; set have; length firstnumber $20; firstnumber=scan(compress(string," ,","kd"),1," ,"); run;
What I do here is to compress the string, removing all charcters except digits, space and comma (the k means keep rather than drop). Thus I am left with a string with the numbers separated by commas or spaces. Then I scan for the first one.
Nice idea @RW9 but wouldn't it give an incorrect answer for the final record in this data set?
data have;
string="fhwfwoeuh, 12314, fjweipfjwp 214155";output;
string="Hello, 112345 Joifhew, 12-08-2017";output;
string="12343 dfnweon 12.08.17";output;
string="978y 12 dfnweon 12.08.17";output;
run;
Yes, he would need to add any rule on top which is needed, in the case of the data you give:
data want; set have; length firstnumber $20; firstnumber=scan(compress(string," ,","kd"),1," ,"); if lengthn(firstnumber) < 4 then firstnumber=""; run;
That would fix it, however if you need to take the date as well, or further conditions, then maybe scanning over each delimited word from the compress is the way to go:
data want (drop=temp i); set have; length firstnumber temp $200; temp=compress(string," ,","kd"); do i=1 to countw(temp," ,"); if lengthn(scan(temp,1," ,")) > =4 and firstnumber="" then firstnumber=scan(temp,1," ,"); end; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.