SAS Programming

raja777pharma · Posted 06-26-2021 06:14 AM

Hello

I would like to extract the number from starting to end include special charcters.

String1: listing16-88-004 . subjects of devations . xlsx;

String2 : listing 16.88.004. subjects of devations . xlsx;

I need output from above tow string as below.

String1_output = '16-88-004';

string2_output = '16.88.004';

Thank you.

Raja.

Tom · Posted 06-26-2021 11:03 AM

I would use a modified version of what @Ksharp proposed.

data have;
  input have $80.;
cards4;
Text charcters 1.2.34.16 text charcters many
15-6-7.8 text charcters many
text charctes words many 14-8-16-46 many alphabets
Regular number 12345 
;;;;;

data want;
 set have;
 pid=prxparse('/\d+[\d\.-]*\d+/');
 call prxsubstr(pid,have,p,l);
 if p>0 then want=substr(have,p,l);
 drop pid p l;
run;

proc print;
run;

So this pattern:

/\d+[\d\.-]*\d+/

Says to match strings that start with a digit and end with a digit and have zero more more digits, periods or hyphens in between the two terminal digits. So it will not match one digit strings.

Obs    have                                                  want

 1     Text charcters 1.2.34.16 text charcters many          1.2.34.16
 2     15-6-7.8 text charcters many                          15-6-7.8
 3     text charctes words many 14-8-16-46 many alphabets    14-8-16-46
 4     Regular number 12345                                  12345

But it will match strings that are only digits. If you need to eliminate those results you could test the string returned and make sure it has either a period or hyphen.

if not indexc(want,'.-') then want=' ';

Obs    have                                                     want

 1     Text charcters 1.2.34.16 text charcters many          1.2.34.16
 2     15-6-7.8 text charcters many                          15-6-7.8
 3     text charctes words many 14-8-16-46 many alphabets    14-8-16-46
 4     Regular number 12345

View solution in original post

PaigeMiller · Posted 06-26-2021 06:27 AM

It's really hard to create rules and then program them if all we have is just two examples.

So, a couple of questions:

Does the string always begin with the text 'listing ' ?
Or does the string sometimes begin with other text ?
Is there always a space before the numbers, or not?

How about YOU tell us the rules for extracting the numbers of interest, and then we can show you code that might work.

--
Paige Miller

raja777pharma · Posted 06-26-2021 08:46 AM

Hello ,

Please refer below ..

Does the string always begin with the text 'listing ' ? Ans : No
Or does the string sometimes begin with other text ?

Ans : Yes

Is there always a space before the numbers, or not?

Ans : Not alwas

PaigeMiller · Posted 06-26-2021 10:12 AM

@raja777pharma wrote:

Hello ,

Please refer below ..

Does the string always begin with the text 'listing ' ? Ans : No

Or does the string sometimes begin with other text ?

Ans : Yes

Is there always a space before the numbers, or not?

Ans : Not alwas

So please provide some rules (in words) that we can use to extract the numbers you want.

--
Paige Miller

raja777pharma · Posted 06-26-2021 10:36 AM

Hi Miller,

My requirement is to extract numbers in text string as when number(s) present in middle of string with combined special charcters like '.' or '-' .

Rule 1 : Extract numeric string from start to end with special charcters.

string1 = 'Text charcters 1.2.34.16 text charcters many ' ;

string2 = '15-6-7.8 text charcters many ';

string3 = 'text charctes words many 14-8-16-46 many alphabets';

Expected Out puts: want string = '1.2.34.16';

want_strint2 = '15-6-7.8' ;

want_string3='14-8-16-46';

Thank you,

Raja.

Tom · Posted 06-26-2021 11:03 AM

I would use a modified version of what @Ksharp proposed.

data have;
  input have $80.;
cards4;
Text charcters 1.2.34.16 text charcters many
15-6-7.8 text charcters many
text charctes words many 14-8-16-46 many alphabets
Regular number 12345 
;;;;;

data want;
 set have;
 pid=prxparse('/\d+[\d\.-]*\d+/');
 call prxsubstr(pid,have,p,l);
 if p>0 then want=substr(have,p,l);
 drop pid p l;
run;

proc print;
run;

So this pattern:

/\d+[\d\.-]*\d+/

Says to match strings that start with a digit and end with a digit and have zero more more digits, periods or hyphens in between the two terminal digits. So it will not match one digit strings.

Obs    have                                                  want

 1     Text charcters 1.2.34.16 text charcters many          1.2.34.16
 2     15-6-7.8 text charcters many                          15-6-7.8
 3     text charctes words many 14-8-16-46 many alphabets    14-8-16-46
 4     Regular number 12345                                  12345

But it will match strings that are only digits. If you need to eliminate those results you could test the string returned and make sure it has either a period or hyphen.

if not indexc(want,'.-') then want=' ';

Obs    have                                                     want

 1     Text charcters 1.2.34.16 text charcters many          1.2.34.16
 2     15-6-7.8 text charcters many                          15-6-7.8
 3     text charctes words many 14-8-16-46 many alphabets    14-8-16-46
 4     Regular number 12345

Ksharp · Posted 06-26-2021 07:05 AM

data have;
input have $80.;
cards4;
String1: listing16-88-004 . subjects of devations . xlsx;
String2 : listing 16.88.004. subjects of devations . xlsx;
;;;;

data want;
 set have;
 pid=prxparse('/\d+\D\d+\D\d+/');
 call prxsubstr(pid,have,p,l);
 if p>0 then want=substr(have,p,l);
 drop pid p l;
run;

mkeintz · Posted 06-26-2021 12:34 PM

Here's a tired old loop through a subset of characters in string.

data have;
  do string='listing16-88-004 . subjects of devations . xlsx'
           ,'listing 16.88.004. subjects of devations . xlsx';
    output;
  end;
run;

data want (drop=i);
  set have;
  length newvar $15;
  if anydigit(string) then do i=anydigit(string) to length(string);
    if anydigit(substr(string||' ',i,2)) then newvar=cats(newvar,char(string,i));
    else leave;
  end;
run;

Starting at the first numeric character, it advances through string, appending the current character to newvar as long as the current character, or the next character (to accommodate current character as a separator), is numeric. It doesn't care what the separator character is.

Once a non-qualifying character is reached, exit the loop.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Programming

Extract Number in a string with special charctes in middle of number with regular expression

Re: Extract Number in a string with special charctes in middle of number with regular expression

Re: Extract Number in a string with special charctes in middle of number with regular expression

Re: Extract Number in a string with special charctes in middle of number with regular expression

Re: Extract Number in a string with special charctes in middle of number with regular expression

Re: Extract Number in a string with special charctes in middle of number with regular expression

Re: Extract Number in a string with special charctes in middle of number with regular expression

Re: Extract Number in a string with special charctes in middle of number with regular expression

Re: Extract Number in a string with special charctes in middle of number with regular expression

a question on regular expression

Perl regular expression

Extract a number somewhere in a string

Extract Numbers from specific text string?

Extract the middle numbers from an Alphanumeric string of varying leng...

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...