10-17-2016 03:44 PM
New to SAS
I am trying to flag all variables with the string "comprehensive assessment".
This is part of my code:
ca_only = 0; if prxmatch ("m/Comprehensive Assessment|comprehensive ax|HW - Comprehensive Ax|FA - Comprehensive Ax|FA - Comprehensive Ax TM|HW - Comprehensive Ax TM|FA - Reassessment|HW - Reassessment|Reassessment|Dominance|Comp Ax Telemedicine|Clinic Comprehensive Assessment|Comprehensive Ax - Telemedicine|Comprehensive Ax - TM/oi", description)> 0 then if prxmatch("m/Comprehensive Assessment - OT\/PT|Dominance Transfer Training|conc|psych|LIBYA - AX - COMPREHENSIVE ASSESSMENT|LIBYA - AX - COMPREHENSIVE ASSESSMENT OT\/PT|/oi", description) = 0 then ca_only = 1;
However, I've reached a point now where it seems my code is too long and I am getting an error message saying I have exceeded 252 characters.
How else can I find all values that CONTAIN "comprehensive assessment" but also contain other variations like outlined in my above code (e.g., "comprehensive ax", "comprehensive ax-tm", "comp ax") etc...
thanks in advanced!
10-17-2016 03:57 PM
Your current code is
If <CONDITION1> THEN
if <CONDITION2> then ca_only=1;
This isn't correct SAS syntax.
You may want:
If <CONDITION1> AND <CONDITION2> then ca_only=1;
10-17-2016 04:15 PM
On a side note, SAS does permit IF/THEN/IF. However, nobody uses it because it doesn't save you anything. (All it really does is make the ELSE statement more difficult to interpret.) For example, here's a test program:
do i=1 to 100000000;
if i > 0 and 5=4 then x=2;
do i=1 to 100000000;
if 5=4 and i > 0 then x=2;
do i=1 to 100000000;
if i > 0 then if 5=4 then x=2;
The middle step runs faster because 5=4 is always false. The software is smart enough to figure out that it doesn't need to check the second condition when the first condition is false.
When it comes to your application, why do you need to check all these combinations? If you find "Comprehensive Assessment" isn't that enough so that you don't need to check for variations that would follow? Are there any strings that contain "Comprehensive Assessment" that also contain other characters where it would be incorrect to set CA_ONLY to 1? I realize there are other strings that need to be checked (upper vs. lower case, "ax" vs. "Assessment"), but why do they all need to be checked?
Finally, strings that become lengthy should not generate an error. They should at most generate a warning because the software suspects you have done something wrong. If you know you haven't done anything wrong, you can usually ignore this particular warning. (Very bad advice in general, but applicable in this case.)
10-18-2016 12:51 AM
Actually I believe to remember that in some very outdated SAS publication (for SAS V6 I believe) the IF...THEN...IF...THEN construct had been documented as a way to improve performance. This was at a time when SAS always evaluated the full expression in an AND construct even if the first part already resolved to False.
10-18-2016 09:59 AM
That goes back a long way, but I think you're correct. I think it was closer to the end of the version 6 releases (once initial bugs were worked out, efficiency became more important). But I would never swear to it.
10-18-2016 08:47 AM
Thanks for your response!
I need to flag all values for the variable DESCRIPTION that contain "comprehensive assessment" BUT ALSO other variation of this like "comprehensive ax" + "comp ax" etc.
Just including "Comprehensive Assessment" in my code doesn't capture the other variations of this. I need to do this b/c this is how we set up our rule for flagging-they all mean the same thing but have different values, which is why we are trying to flag them all the same.
Is there a better way for me to accomplish this?
The actual error that I am getting is the following:
The quoted string currently being processed has become more than 262 characters long. You might have unbalanced quotation marks.
So I don't think it's actually processing the code that I posted above. which means that i can't ignore it.
10-18-2016 09:18 AM - edited 10-18-2016 09:21 AM
Well, I don't think I can help with PRXMATCH. But you may not need it. Consider:
test_string = upcase(description);
if index(test_string, 'COMPREHENSIVE ASSESSMENT')
or index(test_string, 'COMPREHENSIVE AX')
or index(test_string, 'COMP AX')
I'm not sure if I covered every possible case here, but this might be enough. My real point is that you may not need to look for every possible variation of text that might occur. Just locating certian key words may be enough.
10-18-2016 09:58 AM
Oh I see what you mean...
so including for just "COMPREHENSIVE" should capture all variations that include that string? So for example, it should also capture "COMPREHENSIVE AX", "COMPREHENSIVE ASSESSMENT" AND "COMPREHENSIVE ASSESS"?
I might have tried that before but I don't think it captured all variations (at least not with prxmatch) which is why I had to spell all variations out one by one.
thanks for trying to help though!
10-18-2016 10:00 AM
Yes, just COMPREHENSIVE would capture all of those strings. If it didn't last time, it might be the result of different capitalization. But just COMPREHENSIVE might also capture other strings that you don't want as well.
10-18-2016 05:05 PM
The FIND() function with the 'i' switch for case insensitive search would eventually suffice.
You can certainly use a RegEx which matches all the strings - you just must know what the pattern is you're searching for. Can you provide some sample data with the different cases?