Hi, I'm trying to extract a text string (name) from a longer string, which may occur either once within the string or multiple times (several names). I'm breaking the longer string into groups and using prxparse and prxmatch with grouping to extract only the group I need, but something is not working. Below is an example of my text string, my code and what I'm looking for at the end.
data have;
Prov_Info ="ProviderName: Spine and Pain Center of Whatchamikola; IDN: 2345678901; IsGroup: No;, ProviderName: Happy Toes; IDN: 3456789012; IsGroup: No;, ProviderName: IDN: 3456789012; IsGroup: Yes;, ProviderName: Bright Smiles of AZ; IDN: 1234567890 IsGroup: Yes;, ";
patternID = prxparse('/^(ProviderName:)( |.*; )(NPI: )/');
if prxmatch(patternID, strip(Prov_Info)) then do;
newname=prxposn(patternID, 2, Prov_Info);
end;
run;
Result needed:
newname=Spine and Pain Center of Whatchamikola; Happy Toes; Bright Smiles of AZ;
Any suggestions would be attreciated.
Thank you!
you may need to use prxnext. prxparse('/(ProviderName:\s+[a-zA-z ]+?;)/') indicates providers name folowed by space and words and till the ; prxnext capture position and length wherever you have this pattern. by doing substr(prov_info, position+13, length-13, we can remove ProviderName:
data have;
Prov_Info ="ProviderName: Spine and Pain Center of Whatchamikola;
IDN: 2345678901; IsGroup: No;,
ProviderName: Happy Toes; IDN: 3456789012; IsGroup: No;,
ProviderName: IDN: 3456789012; IsGroup: Yes;,
ProviderName: Bright Smiles of AZ; IDN: 1234567890 IsGroup: Yes;, ";
run;
data want;
length val patternid $200.;
set have;
start = 1;
stop = length(prov_info);
re = prxparse('/(ProviderName:\s+[a-zA-z ]+?;)/');
set have;
call prxnext(re, start, stop, trim(prov_info), position, length);
do while (position > 0);
val = substr(prov_info, position+13, length-13);
patternID = catx(" ", patternid, val);
call prxnext(re, start, stop, trim(prov_info), position, length);
end;
drop re start stop position length val;
run;
proc print data=want;
run;
you may need to use prxnext. prxparse('/(ProviderName:\s+[a-zA-z ]+?;)/') indicates providers name folowed by space and words and till the ; prxnext capture position and length wherever you have this pattern. by doing substr(prov_info, position+13, length-13, we can remove ProviderName:
data have;
Prov_Info ="ProviderName: Spine and Pain Center of Whatchamikola;
IDN: 2345678901; IsGroup: No;,
ProviderName: Happy Toes; IDN: 3456789012; IsGroup: No;,
ProviderName: IDN: 3456789012; IsGroup: Yes;,
ProviderName: Bright Smiles of AZ; IDN: 1234567890 IsGroup: Yes;, ";
run;
data want;
length val patternid $200.;
set have;
start = 1;
stop = length(prov_info);
re = prxparse('/(ProviderName:\s+[a-zA-z ]+?;)/');
set have;
call prxnext(re, start, stop, trim(prov_info), position, length);
do while (position > 0);
val = substr(prov_info, position+13, length-13);
patternID = catx(" ", patternid, val);
call prxnext(re, start, stop, trim(prov_info), position, length);
end;
drop re start stop position length val;
run;
proc print data=want;
run;
That works! Thank you so very much, kiranv_!!!!
I guess ProviderName: CASE MANAGEMENT & IDN: 1234567890 IsGroup: Yes;, ";
should be like ProviderName: CASE MANAGEMENT & something; IDN: 1234567890 IsGroup: Yes;, ";
one last question. Do you have numbers in providers name
ProviderName: CVS Pharmacy 50
if it does not then you can use.
re = prxparse('/(ProviderName:\s+\D+?;)/');
this should work. i have used ?!IDN: ?! is negative lookahead, it will take everything till ; in providername unless IDN comes after providername and I guess this happens when you have no providername
re = prxparse('/(ProviderName:\s+(?!IDN:).+?;)/');
You are welcome and I am glad it worked
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.