06-23-2016 04:21 PM
I have a data set and I am trying to flag all variables that have the word "treatment" or "tx" with a 1 else 0. However, there are some observations which have the following description: "Supplies-treatment" or "supplies-tx". For these, I DO NOT want them to be flagged with a 1.
Here is my code:
if prxmatch("m/tx|treatment|treat/oi",description) > 0 then tx=1;
this works perfectly! When I run a proc freq I see that there are 33314 obs flagged with a 1.
But then I want to unflag all the descriptions that have the word "supplies" in them. In order to accomplish this I thought I could use a prxmatch as above but on the new dataset I just created with all the words "treatment" flagged with 1, and when it came across any descriptions with "supplies" it would flag it as a 0. I used this code next after running the above code to accomplish the aforementioned:
if prxmatch("m/supplies/oi",description) > 0 then tx=0;
However, when I run a proc freq, I see that it has incorrectly flagged my variables somehow. Instead of having LESS than 33314 obs flagged with a 1 (which should be the case because I am eliminating "supplies-treatment") I actually have MORE obs that have been flagged with a 1. 64174 to be exact!
I did a proc print on 100 obs but everything looks good..what is going on?? What is my code doing exactly?
06-23-2016 11:35 PM
Your second step overwrites tx for every obs. So the end result is just the last test, i.e. anything not supplies is flagged.
In a single step:
data flagtx; set flagtxdc; tx = 0; if prxmatch("m/tx|treatment|treat/oi", description) > 0 then if prxmatch("m/supplies/oi", description) = 0 then tx = 1; run;