DATA Step, Macro, Functions and more

How to extract a specific string from large string where the location varies

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 112
Accepted Solution

How to extract a specific string from large string where the location varies

Good Morning everyone! I have a variable whose values can range from few words to much longer strings. Somewhere within those strings or in the end are "O. XX" or "M. XX" or "M.XXX" where XX are numbers. The numbers are generally two digits, but could be three digits also I only want to extract "O. XX" and "M. XX" from it. Few examples with want column-

 

No    Value                                                        Want

 

1.      Z-Other Services mod 26 (2011)           M. 90
         M. 90
          Modifier 26 Required


2.      Z-Other Services mod 26 (2011)            M. 90
          M. 90
           Modifier 26 Required

 

3.       CASE MANAGEMENT
                    O. 13                                            O. 13

 

4.          Z-RADIOLOGY SERVICES (2013)      M.101
            M.101


Accepted Solutions
Solution
‎04-20-2017 12:43 PM
PROC Star
Posts: 307

Re: How to extract a specific string from large string where the location varies

Adding the ? makes the \s (space) optional:

 

prx = prxparse('/(\s)([A-Z]\.\s?\d{2,3})(\b)/');

View solution in original post


All Replies
PROC Star
Posts: 307

Re: How to extract a specific string from large string where the location varies

Good time for a regular expression? Depending on your data, you might have to tweak it a little. For example, it wasn't clear that the space after the M/O would always be there, but this should get you started.

 

data have;
    txt = 'Z-Other Services mod 26 (2011) M. 90 Modifier 26 Required'; output;
    txt = 'Z-Other Services mod 26 (2011) M. 90'; output;
    txt = 'CASE MANAGEMENT O. 13'; output;
    txt = 'Z-RADIOLOGY SERVICES (2013) M. 101'; output;
run;

data want;
    set have;
    prx = prxparse('/(\s)([A-Z]\.\s\d{2,3})(\b)/');
    if prxmatch(prx, txt) then do;
        code = prxposn(prx, 2, txt);
    end;
run;
Frequent Contributor
Posts: 112

Re: How to extract a specific string from large string where the location varies

Posted in reply to collinelliot

Thanks so much! I will check it now..to answer your question, the space after O or M is generally there but in some cases, there is no space. Will that change the code a bit?

Solution
‎04-20-2017 12:43 PM
PROC Star
Posts: 307

Re: How to extract a specific string from large string where the location varies

Adding the ? makes the \s (space) optional:

 

prx = prxparse('/(\s)([A-Z]\.\s?\d{2,3})(\b)/');

Frequent Contributor
Posts: 112

Re: How to extract a specific string from large string where the location varies

Posted in reply to collinelliot

Thanks again! Yes, it works now for the most part, sorry i just forgot to mention that integers after O. or M. could be just 1 integer too, not only 2 or 3. For instance, there could be a string like below which draws a blank with existing code rather than desired O. 5.

 

AMBULANCE ATTENDANT SERVICESO. 5

 

I tried tweaking your code by adding 1 inside the bracket after d, like this, but it didnt work-

 

data want1;
set kirk2;
prx = prxparse('/(\s)([A-Z]\.\s?\d{1,2,3})(\b)/');
if prxmatch(prx, Contract_Term_Name__Long_Descrip) then do;
code = prxposn(prx, 2, Contract_Term_Name__Long_Descrip);
end;
run;

PROC Star
Posts: 307

Re: How to extract a specific string from large string where the location varies

prx = prxparse('/(\s)([A-Z]\.\s?\d{1,3})(\b)/');

 

The \d{1,3} means from 1 to 3 digits.

Frequent Contributor
Posts: 112

Re: How to extract a specific string from large string where the location varies

Posted in reply to collinelliot

Thanks so much!

Super User
Super User
Posts: 7,993

Re: How to extract a specific string from large string where the location varies

Thats not how your data looks is it, you have copied that from Excel.  Post test data in the form of a datastep!!

 

You will likely be quickest off using prxmatch,but if your string only contains the one "." then you could also do:

want=substr(value,index(value,".")-1,index(value,".")+4);

 

Trusted Advisor
Posts: 1,137

Re: How to extract a specific string from large string where the location varies

please try perl regular expression

 

data have;
input Value&$100.;
cards;                                                   
Z-Other Services mod 26 (2011) M. 90 Modifier 26 Required
Z-Other Services mod 26 (2011) M. 90Modifier 26 Required
;

data want ;
set have;
if prxmatch('m/\w\.\s\d+/',value)>0 then flag=substr(value,prxmatch('m/\w\.\s\d+/',value),5);
run;
Thanks,
Jag
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 185 views
  • 1 like
  • 4 in conversation