06-20-2016 03:55 PM
New to sas and have no idea what I'm doing!
I need to flag a number of obsevations in my data set.
For example, I want to flag all observations that have a description that includes "tx" or "treatment"- I want to find any iteration of these strings (i.e., whether it's CAPS or some letters are caps or others aren't).
I also want to EXCLUDE the following strings= "supplies-treatment", "Supplies-treamtent" and "Acute/Subacute Treatment - Discharge".
This is the code I have used, but I'm not sure if I'm doing this right at all:
if prxmatch("m/tx|treatment|treat/oi",description) > 0 then tx=1;
Is this right and how do I EXCLUDE certain strings.
(If you answer, I would really appreciate it if you could dumb it down! I'm very new to SAS and don't really understand a lot of the terminiology, lol)
06-20-2016 03:59 PM
What do you mean by exclude - remove those strings or remove observation?
Can you post sample data and expected output?
if your a SAS beginner and not familiar with perl I would recommend the find/finds/index/indexes functions along with upcase/lowcase.
Upcase allows you capitalize all your text so you can simplify comparisons.
Find/index search a string for a specified substring or word.
06-20-2016 04:11 PM
Thanks for replying.
Here is some example data
1 chronic treatment
3 acute/subacute tx
5 Supplies - Treament
My expected output it
ID Description flagtx
1 chronic treatment 1
2 Treatment 1
3 acute/subacute tx 1
4 supplies-treatment 0
5 Supplies - Treament 0
6 apples 0
As you can see from my expected output, ID 1-3 were flagged in the new variable flagtx because they contain the word treatment or Treatment or tx. ID 4 &5 I want to exclude meaning that they are NOT flagged or are coded as a 0. ID 6 is given a value of 0 as well because it doesn't contain any "treatment" or "tx".
Does that make sense?
06-20-2016 04:27 PM
This works for your sample but may not for your full dataset. You may need to expand your rules...cleaning data is never fun.
If you end up with a long list it may be worth setting up an array instead of searching for each term one at a time. But this should get you started.
data have; input ID Description $30.; cards; 1 chronic treatment 2 Treatment 3 acute/subacute tx 4 supplies-treatment 5 Supplies - Treatment 6 apples ; run; data want; set have; description = upcase(description); if find(description, 'TX') > 0 or find(description, 'TREATMENT')>0 then flagtx=1; else flagtx=0; if find(description, 'SUPPLIES')>0 or find(description, 'DISCHARGE')>0 then flagtx=0; run;
06-21-2016 08:09 AM
Why did you use the UPCASE function. Does using this mean that the program will only find the strings in uppercase format?
What if I wanted to find the strings in any variation? For example "TREATMENT" or "Treatment" or "treatment" or TrEATment"?
Thank you for helping me out
06-21-2016 08:40 AM
Upcase converts everything to capitals. This means you can to a single comparison rather than worry about testing for different cases.