Hello
I would like to extract the number from starting to end include special charcters.
String1: listing16-88-004 . subjects of devations . xlsx;
String2 : listing 16.88.004. subjects of devations . xlsx;
I need output from above tow string as below.
String1_output = '16-88-004';
string2_output = '16.88.004';
Thank you.
Raja.
I would use a modified version of what @Ksharp proposed.
data have;
input have $80.;
cards4;
Text charcters 1.2.34.16 text charcters many
15-6-7.8 text charcters many
text charctes words many 14-8-16-46 many alphabets
Regular number 12345
;;;;;
data want;
set have;
pid=prxparse('/\d+[\d\.-]*\d+/');
call prxsubstr(pid,have,p,l);
if p>0 then want=substr(have,p,l);
drop pid p l;
run;
proc print;
run;
So this pattern:
/\d+[\d\.-]*\d+/
Says to match strings that start with a digit and end with a digit and have zero more more digits, periods or hyphens in between the two terminal digits. So it will not match one digit strings.
Obs have want 1 Text charcters 1.2.34.16 text charcters many 1.2.34.16 2 15-6-7.8 text charcters many 15-6-7.8 3 text charctes words many 14-8-16-46 many alphabets 14-8-16-46 4 Regular number 12345 12345
But it will match strings that are only digits. If you need to eliminate those results you could test the string returned and make sure it has either a period or hyphen.
if not indexc(want,'.-') then want=' ';
Obs have want 1 Text charcters 1.2.34.16 text charcters many 1.2.34.16 2 15-6-7.8 text charcters many 15-6-7.8 3 text charctes words many 14-8-16-46 many alphabets 14-8-16-46 4 Regular number 12345
It's really hard to create rules and then program them if all we have is just two examples.
So, a couple of questions:
How about YOU tell us the rules for extracting the numbers of interest, and then we can show you code that might work.
Hello ,
Please refer below ..
Ans : Yes
Ans : Not alwas
@raja777pharma wrote:
Hello ,
Please refer below ..
- Does the string always begin with the text 'listing ' ? Ans : No
- Or does the string sometimes begin with other text ?
Ans : Yes
- Is there always a space before the numbers, or not?
Ans : Not alwas
So please provide some rules (in words) that we can use to extract the numbers you want.
Hi Miller,
My requirement is to extract numbers in text string as when number(s) present in middle of string with combined special charcters like '.' or '-' .
Rule 1 : Extract numeric string from start to end with special charcters.
string1 = 'Text charcters 1.2.34.16 text charcters many ' ;
string2 = '15-6-7.8 text charcters many ';
string3 = 'text charctes words many 14-8-16-46 many alphabets';
Expected Out puts: want string = '1.2.34.16';
want_strint2 = '15-6-7.8' ;
want_string3='14-8-16-46';
Thank you,
Raja.
I would use a modified version of what @Ksharp proposed.
data have;
input have $80.;
cards4;
Text charcters 1.2.34.16 text charcters many
15-6-7.8 text charcters many
text charctes words many 14-8-16-46 many alphabets
Regular number 12345
;;;;;
data want;
set have;
pid=prxparse('/\d+[\d\.-]*\d+/');
call prxsubstr(pid,have,p,l);
if p>0 then want=substr(have,p,l);
drop pid p l;
run;
proc print;
run;
So this pattern:
/\d+[\d\.-]*\d+/
Says to match strings that start with a digit and end with a digit and have zero more more digits, periods or hyphens in between the two terminal digits. So it will not match one digit strings.
Obs have want 1 Text charcters 1.2.34.16 text charcters many 1.2.34.16 2 15-6-7.8 text charcters many 15-6-7.8 3 text charctes words many 14-8-16-46 many alphabets 14-8-16-46 4 Regular number 12345 12345
But it will match strings that are only digits. If you need to eliminate those results you could test the string returned and make sure it has either a period or hyphen.
if not indexc(want,'.-') then want=' ';
Obs have want 1 Text charcters 1.2.34.16 text charcters many 1.2.34.16 2 15-6-7.8 text charcters many 15-6-7.8 3 text charctes words many 14-8-16-46 many alphabets 14-8-16-46 4 Regular number 12345
data have;
input have $80.;
cards4;
String1: listing16-88-004 . subjects of devations . xlsx;
String2 : listing 16.88.004. subjects of devations . xlsx;
;;;;
data want;
set have;
pid=prxparse('/\d+\D\d+\D\d+/');
call prxsubstr(pid,have,p,l);
if p>0 then want=substr(have,p,l);
drop pid p l;
run;
Here's a tired old loop through a subset of characters in string.
data have;
do string='listing16-88-004 . subjects of devations . xlsx'
,'listing 16.88.004. subjects of devations . xlsx';
output;
end;
run;
data want (drop=i);
set have;
length newvar $15;
if anydigit(string) then do i=anydigit(string) to length(string);
if anydigit(substr(string||' ',i,2)) then newvar=cats(newvar,char(string,i));
else leave;
end;
run;
Starting at the first numeric character, it advances through string, appending the current character to newvar as long as the current character, or the next character (to accommodate current character as a separator), is numeric. It doesn't care what the separator character is.
Once a non-qualifying character is reached, exit the loop.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.