DATA Step, Macro, Functions and more

How to find strings in a dataset?

Reply
Frequent Contributor
Posts: 88

How to find strings in a dataset?

Hi Everyone, 

 

New to sas and have no idea what I'm doing!

 

I need to flag a number of obsevations in my data set.

 

For example, I want to flag all observations that have a description that includes "tx" or "treatment"- I want to find any iteration of these strings (i.e., whether it's CAPS or some letters are caps or others aren't). 

 

I also want to EXCLUDE the following strings= "supplies-treatment", "Supplies-treamtent" and "Acute/Subacute Treatment - Discharge".

 

This is the code I have used, but I'm not sure if I'm doing this right at all:

 

data flagtx;
set flagtxdc;
if prxmatch("m/tx|treatment|treat/oi",description) > 0 then tx=1;
else tx=0;
run;

 

Is this right and how do I EXCLUDE certain strings.

 

(If you answer, I would really appreciate it if you could dumb it down! I'm very new to SAS and don't really understand a lot of the terminiology, lol)

 

thanks!

Super User
Posts: 17,750

Re: How to find strings in a dataset?

What do you mean by exclude - remove those strings or remove observation?

 

Can you post sample data and expected output?

 

if your a SAS beginner and not familiar with perl I would recommend the find/finds/index/indexes functions along with upcase/lowcase. 

Upcase allows you capitalize all your text so you can simplify comparisons. 

Find/index search a string for a specified substring or word. 

Frequent Contributor
Posts: 88

Re: How to find strings in a dataset?

Hi Reeza, 

 

Thanks for replying.

 

Here is some example data

 

ID   Description

1     chronic treatment

2     Treatment

3     acute/subacute tx

4     supplies-treatment

5     Supplies - Treament

6     apples

 

My expected output it

 

ID   Description                  flagtx

1     chronic treatment          1

2     Treatment                      1

3     acute/subacute tx          1

4     supplies-treatment         0

5     Supplies - Treament      0

6     apples                            0

 

 

As you can see from my expected output, ID 1-3 were flagged in the new variable flagtx because they contain the word treatment or Treatment or tx. ID 4 &5 I want to exclude meaning that they are NOT flagged or are coded as a 0. ID 6 is given a value of 0 as well because it doesn't contain any "treatment" or "tx".

 

Does that make sense?

 

 

 

Super User
Posts: 17,750

Re: How to find strings in a dataset?

This works for your sample but may not for your full dataset. You may need to expand your rules...cleaning data is never fun. 

If you end up with a long list it may be worth setting up an array instead of searching for each term one at a time. But this should get you started.

 

data have;
input ID   Description $30.;
cards;
1     chronic treatment
2     Treatment
3     acute/subacute tx
4     supplies-treatment
5     Supplies - Treatment
6     apples
;
run;


data want;
set have;
description = upcase(description);

if find(description, 'TX') > 0 
or find(description, 'TREATMENT')>0 
	then flagtx=1;
else flagtx=0;

if find(description, 'SUPPLIES')>0 
or find(description, 'DISCHARGE')>0 
	then flagtx=0;


run;
Frequent Contributor
Posts: 88

Re: How to find strings in a dataset?

Thanks Reeza, 

 

Why did you use the UPCASE function. Does using this mean that the program will only find the strings in uppercase format?

 

What if I wanted to find the strings in any variation? For example "TREATMENT" or "Treatment" or "treatment" or TrEATment"?

 

Thank you for helping me outSmiley Happy

Super User
Posts: 17,750

Re: How to find strings in a dataset?

Upcase converts everything to capitals. This means you can to a single comparison rather than worry about testing for different cases. 

 

 

Frequent Contributor
Posts: 88

Re: How to find strings in a dataset?

OHH SMART!! THANK YOU VERY MUCH!

Ask a Question
Discussion stats
  • 6 replies
  • 279 views
  • 5 likes
  • 2 in conversation