BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
raja777pharma
Fluorite | Level 6

Hello 

 

I would like to extract the number from starting to end include special charcters.

String1: listing16-88-004 . subjects of devations . xlsx;

String2 : listing 16.88.004. subjects of devations . xlsx;

 

I need output from above tow string as below.

String1_output  = '16-88-004';

string2_output   = '16.88.004';

 

Thank you.

Raja.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

I would use a modified version of what @Ksharp proposed.

data have;
  input have $80.;
cards4;
Text charcters 1.2.34.16 text charcters many
15-6-7.8 text charcters many
text charctes words many 14-8-16-46 many alphabets
Regular number 12345 
;;;;;

data want;
 set have;
 pid=prxparse('/\d+[\d\.-]*\d+/');
 call prxsubstr(pid,have,p,l);
 if p>0 then want=substr(have,p,l);
 drop pid p l;
run;

proc print;
run;

So this pattern:

/\d+[\d\.-]*\d+/

Says to match strings that start with a digit and end with a digit and have zero more more digits, periods or hyphens in between the two terminal digits.  So it will not match one digit strings.

Obs    have                                                  want

 1     Text charcters 1.2.34.16 text charcters many          1.2.34.16
 2     15-6-7.8 text charcters many                          15-6-7.8
 3     text charctes words many 14-8-16-46 many alphabets    14-8-16-46
 4     Regular number 12345                                  12345

 But it will match strings that are only digits.  If you need to eliminate those results you could test the string returned and make sure it has either a period or hyphen.

if not indexc(want,'.-') then want=' ';
Obs    have                                                     want

 1     Text charcters 1.2.34.16 text charcters many          1.2.34.16
 2     15-6-7.8 text charcters many                          15-6-7.8
 3     text charctes words many 14-8-16-46 many alphabets    14-8-16-46
 4     Regular number 12345

View solution in original post

7 REPLIES 7
PaigeMiller
Diamond | Level 26

It's really hard to create rules and then program them if all we have is just two examples.

 

So, a couple of questions:

 

  1. Does the string always begin with the text 'listing ' ?
  2. Or does the string sometimes begin with other text ?
  3. Is there always a space before the numbers, or not?

How about YOU tell us the rules for extracting the numbers of interest, and then we can show you code that might work.

--
Paige Miller
raja777pharma
Fluorite | Level 6

 

Hello ,

 

Please refer below ..

 

  1. Does the string always begin with the text 'listing ' ?      Ans : No
  2. Or does the string sometimes begin with other text ? 

                  Ans : Yes

  1. Is there always a space before the numbers, or not?

                  Ans : Not alwas 

PaigeMiller
Diamond | Level 26

@raja777pharma wrote:

 

Hello ,

 

Please refer below ..

 

  1. Does the string always begin with the text 'listing ' ?      Ans : No
  2. Or does the string sometimes begin with other text ? 

                  Ans : Yes

  1. Is there always a space before the numbers, or not?

                  Ans : Not alwas 


So please provide some rules (in words) that we can use to extract the numbers you want.

--
Paige Miller
raja777pharma
Fluorite | Level 6

Hi Miller,

 

My requirement is to extract numbers in text string as when number(s) present in middle of string with combined special charcters like '.' or '-' .

 

Rule 1 : Extract numeric string from start to end with special charcters.

            string1 = 'Text charcters 1.2.34.16 text charcters many ' ;

            string2 = '15-6-7.8 text charcters many ';

            string3  = 'text charctes words many 14-8-16-46 many alphabets';

 

Expected Out puts:   want string   = '1.2.34.16';

                                  want_strint2 = '15-6-7.8' ;

                                  want_string3='14-8-16-46';

           

 

Thank you,

Raja.

           

 

 

 

Tom
Super User Tom
Super User

I would use a modified version of what @Ksharp proposed.

data have;
  input have $80.;
cards4;
Text charcters 1.2.34.16 text charcters many
15-6-7.8 text charcters many
text charctes words many 14-8-16-46 many alphabets
Regular number 12345 
;;;;;

data want;
 set have;
 pid=prxparse('/\d+[\d\.-]*\d+/');
 call prxsubstr(pid,have,p,l);
 if p>0 then want=substr(have,p,l);
 drop pid p l;
run;

proc print;
run;

So this pattern:

/\d+[\d\.-]*\d+/

Says to match strings that start with a digit and end with a digit and have zero more more digits, periods or hyphens in between the two terminal digits.  So it will not match one digit strings.

Obs    have                                                  want

 1     Text charcters 1.2.34.16 text charcters many          1.2.34.16
 2     15-6-7.8 text charcters many                          15-6-7.8
 3     text charctes words many 14-8-16-46 many alphabets    14-8-16-46
 4     Regular number 12345                                  12345

 But it will match strings that are only digits.  If you need to eliminate those results you could test the string returned and make sure it has either a period or hyphen.

if not indexc(want,'.-') then want=' ';
Obs    have                                                     want

 1     Text charcters 1.2.34.16 text charcters many          1.2.34.16
 2     15-6-7.8 text charcters many                          15-6-7.8
 3     text charctes words many 14-8-16-46 many alphabets    14-8-16-46
 4     Regular number 12345

Ksharp
Super User
data have;
input have $80.;
cards4;
String1: listing16-88-004 . subjects of devations . xlsx;
String2 : listing 16.88.004. subjects of devations . xlsx;
;;;;

data want;
 set have;
 pid=prxparse('/\d+\D\d+\D\d+/');
 call prxsubstr(pid,have,p,l);
 if p>0 then want=substr(have,p,l);
 drop pid p l;
run;
mkeintz
PROC Star

Here's a tired old loop through a subset of characters in string.

data have;
  do string='listing16-88-004 . subjects of devations . xlsx'
           ,'listing 16.88.004. subjects of devations . xlsx';
    output;
  end;
run;

data want (drop=i);
  set have;
  length newvar $15;
  if anydigit(string) then do i=anydigit(string) to length(string);
    if anydigit(substr(string||' ',i,2)) then newvar=cats(newvar,char(string,i));
    else leave;
  end;
run;

 

 

Starting at the first numeric character, it advances through string, appending the current character to newvar as long as the current character, or the next character (to accommodate current character as a separator), is numeric.  It doesn't care what the separator character is.

 

Once a non-qualifying character is reached, exit the loop.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1075 views
  • 2 likes
  • 5 in conversation