- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello
I would like to extract the number from starting to end include special charcters.
String1: listing16-88-004 . subjects of devations . xlsx;
String2 : listing 16.88.004. subjects of devations . xlsx;
I need output from above tow string as below.
String1_output = '16-88-004';
string2_output = '16.88.004';
Thank you.
Raja.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I would use a modified version of what @Ksharp proposed.
data have;
input have $80.;
cards4;
Text charcters 1.2.34.16 text charcters many
15-6-7.8 text charcters many
text charctes words many 14-8-16-46 many alphabets
Regular number 12345
;;;;;
data want;
set have;
pid=prxparse('/\d+[\d\.-]*\d+/');
call prxsubstr(pid,have,p,l);
if p>0 then want=substr(have,p,l);
drop pid p l;
run;
proc print;
run;
So this pattern:
/\d+[\d\.-]*\d+/
Says to match strings that start with a digit and end with a digit and have zero more more digits, periods or hyphens in between the two terminal digits. So it will not match one digit strings.
Obs have want 1 Text charcters 1.2.34.16 text charcters many 1.2.34.16 2 15-6-7.8 text charcters many 15-6-7.8 3 text charctes words many 14-8-16-46 many alphabets 14-8-16-46 4 Regular number 12345 12345
But it will match strings that are only digits. If you need to eliminate those results you could test the string returned and make sure it has either a period or hyphen.
if not indexc(want,'.-') then want=' ';
Obs have want 1 Text charcters 1.2.34.16 text charcters many 1.2.34.16 2 15-6-7.8 text charcters many 15-6-7.8 3 text charctes words many 14-8-16-46 many alphabets 14-8-16-46 4 Regular number 12345
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It's really hard to create rules and then program them if all we have is just two examples.
So, a couple of questions:
- Does the string always begin with the text 'listing ' ?
- Or does the string sometimes begin with other text ?
- Is there always a space before the numbers, or not?
How about YOU tell us the rules for extracting the numbers of interest, and then we can show you code that might work.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello ,
Please refer below ..
- Does the string always begin with the text 'listing ' ? Ans : No
- Or does the string sometimes begin with other text ?
Ans : Yes
- Is there always a space before the numbers, or not?
Ans : Not alwas
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@raja777pharma wrote:
Hello ,
Please refer below ..
- Does the string always begin with the text 'listing ' ? Ans : No
- Or does the string sometimes begin with other text ?
Ans : Yes
- Is there always a space before the numbers, or not?
Ans : Not alwas
So please provide some rules (in words) that we can use to extract the numbers you want.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Miller,
My requirement is to extract numbers in text string as when number(s) present in middle of string with combined special charcters like '.' or '-' .
Rule 1 : Extract numeric string from start to end with special charcters.
string1 = 'Text charcters 1.2.34.16 text charcters many ' ;
string2 = '15-6-7.8 text charcters many ';
string3 = 'text charctes words many 14-8-16-46 many alphabets';
Expected Out puts: want string = '1.2.34.16';
want_strint2 = '15-6-7.8' ;
want_string3='14-8-16-46';
Thank you,
Raja.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I would use a modified version of what @Ksharp proposed.
data have;
input have $80.;
cards4;
Text charcters 1.2.34.16 text charcters many
15-6-7.8 text charcters many
text charctes words many 14-8-16-46 many alphabets
Regular number 12345
;;;;;
data want;
set have;
pid=prxparse('/\d+[\d\.-]*\d+/');
call prxsubstr(pid,have,p,l);
if p>0 then want=substr(have,p,l);
drop pid p l;
run;
proc print;
run;
So this pattern:
/\d+[\d\.-]*\d+/
Says to match strings that start with a digit and end with a digit and have zero more more digits, periods or hyphens in between the two terminal digits. So it will not match one digit strings.
Obs have want 1 Text charcters 1.2.34.16 text charcters many 1.2.34.16 2 15-6-7.8 text charcters many 15-6-7.8 3 text charctes words many 14-8-16-46 many alphabets 14-8-16-46 4 Regular number 12345 12345
But it will match strings that are only digits. If you need to eliminate those results you could test the string returned and make sure it has either a period or hyphen.
if not indexc(want,'.-') then want=' ';
Obs have want 1 Text charcters 1.2.34.16 text charcters many 1.2.34.16 2 15-6-7.8 text charcters many 15-6-7.8 3 text charctes words many 14-8-16-46 many alphabets 14-8-16-46 4 Regular number 12345
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
input have $80.;
cards4;
String1: listing16-88-004 . subjects of devations . xlsx;
String2 : listing 16.88.004. subjects of devations . xlsx;
;;;;
data want;
set have;
pid=prxparse('/\d+\D\d+\D\d+/');
call prxsubstr(pid,have,p,l);
if p>0 then want=substr(have,p,l);
drop pid p l;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here's a tired old loop through a subset of characters in string.
data have;
do string='listing16-88-004 . subjects of devations . xlsx'
,'listing 16.88.004. subjects of devations . xlsx';
output;
end;
run;
data want (drop=i);
set have;
length newvar $15;
if anydigit(string) then do i=anydigit(string) to length(string);
if anydigit(substr(string||' ',i,2)) then newvar=cats(newvar,char(string,i));
else leave;
end;
run;
Starting at the first numeric character, it advances through string, appending the current character to newvar as long as the current character, or the next character (to accommodate current character as a separator), is numeric. It doesn't care what the separator character is.
Once a non-qualifying character is reached, exit the loop.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------