Hello,
I'm trying to search for multiple alphanumeric strings in a variable. There are two types of searches I want to do: 1. Search for only the specific string (e.g. E45 only, not any string that contains E45) and 2. search for any string that starts with a specified string of characters (e.g. any string that begins with E45). Once I identify any of these alphanumeric strings I want to flag it as 1 and if the strings are not present, I want the variable to be 0.
I can't just copy and paste the data here for privacy reasons but it looks like this:
data test;
input alphanum $ ;
datalines;
G85.0;Z58.6;;
;;
Z00.125;F80.14;F82.0;
;;;E66.09;H54.0;E66.9;E66.01;
Z68.55
F20.96
F32O
F331;Z68.55;Z67.56;Z74.89;
run;
I wrote code like this for each of the flags: if prxmatch('/^H5G|\bH5X.66\b|\bH5F.1G\b|^H5D\b/', icd10)>0 then test_flag=1;
else test_flag=0;
If there is a ^ then it should pull in any strings that begin with those letters and numbers. If the string is bordered by \b, then it should only flag it if that exact string appears.
Some of the flags worked but some of them are flagging strings they should not be, completely unrelated strings.
I did get this as a warning on some of the flags but am not sure how to fix it:
NOTE: The quoted string currently being processed has become more than 262 characters long. You
might have unbalanced quotation marks.
Could this cause it? IS prxmatch just not meant to work with this kind of data and if so, is there another way to search multiple alphanumeric strings?
You discuss and use as example searching for E45 and do not include any values containing E45. So what exactly are you searching for? Please do not make us have to guess exactly what you are searching for.
You should include at least one example of each type of search and show the result.
Any time you have a question about an error or warning message then best practice is to copy from the log the entire procedure or data step code generating the message an all the notes, warnings, messages or errors then on the forum open a text box using the </> icon above the message window and paste all of that text. The text box will preserve formatting of many of the diagnostic messages SAS provides.
Your particular warning is about 90% of the time caused by either a missing quote or mismatched quote (one single quote and one double) somewhere. Which is why you should provide the entire code as the message may not appear until several lines after the problem starts.
It appears that you are searching in the middle, so 'any string starts with' is apparently a misdirection unless you define what starts a 'string'.
@GreenTriangle wrote:
Hello,
I'm trying to search for multiple alphanumeric strings in a variable. There are two types of searches I want to do: 1. Search for only the specific string (e.g. E45 only, not any string that contains E45) and 2. search for any string that starts with a specified string of characters (e.g. any string that begins with E45). Once I identify any of these alphanumeric strings I want to flag it as 1 and if the strings are not present, I want the variable to be 0.
I can't just copy and paste the data here for privacy reasons but it looks like this:
data test;
input alphanum $ ;
datalines;
G85.0;Z58.6;;
;;
Z00.125;F80.14;F82.0;
;;;E66.09;H54.0;E66.9;E66.01;
Z68.55
F20.96
F32O
F331;Z68.55;Z67.56;Z74.89;
run;
I wrote code like this for each of the flags: if prxmatch('/^H5G|\bH5X.66\b|\bH5F.1G\b|^H5D\b/', icd10)>0 then test_flag=1;
else test_flag=0;
If there is a ^ then it should pull in any strings that begin with those letters and numbers. If the string is bordered by \b, then it should only flag it if that exact string appears.
Some of the flags worked but some of them are flagging strings they should not be, completely unrelated strings.
I did get this as a warning on some of the flags but am not sure how to fix it:
NOTE: The quoted string currently being processed has become more than 262 characters long. You
might have unbalanced quotation marks.
Could this cause it? IS prxmatch just not meant to work with this kind of data and if so, is there another way to search multiple alphanumeric strings?
...and... you're not even searching for E45 in your regex (!) ...and... your dataset does not work
F44.0? 😶
- Cheers -
Of course, you don't need to use PRXMATCH
Search 1
if string = 'E45' then ... ;
Search 2
if string =: 'E45' then ... ;
The two searches you want to search for can be collapsed into the code I provide for search 2, and search 1 is not needed.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.