DATA Step, Macro, Functions and more

Use IF to find a string THEN newvar = string

Accepted Solution Solved
Reply
Contributor
Posts: 24
Accepted Solution

Use IF to find a string THEN newvar = string

Guys, could i ask for some guidane again please?

 

I have 122k lines of data where 1 of the variables is a user defined text entry.  Somewhere within the variable is the string Captain or General or Colonel, etc (i have to find 144 differerent strings in total).

 

I need to find each instance of Captain in the text and then put Captain into a Variable called Type.

 

I have been trying:

 

if Text contains 'Captain' then Type = 'Captain' - this gives an arithmetic error.  I dont think i can use contains with IF!

 

How do i find a string in any position in a text string and then put that string into  my newvar?

 

thanks guys,


Accepted Solutions
Solution
‎11-17-2016 09:28 AM
PROC Star
Posts: 547

Re: Use IF to find a string THEN newvar = string

[ Edited ]

Use the INDEX function like this

 

data want;
   set have;

   if index(textvar, 'Captain') > 0 then type = 'Captain';
   else if index(textvar, 'General') > 0 then type = 'General';
   else if index(textvar, 'Colonel') > 0 then type = 'Colonel';
run;

View solution in original post


All Replies
Solution
‎11-17-2016 09:28 AM
PROC Star
Posts: 547

Re: Use IF to find a string THEN newvar = string

[ Edited ]

Use the INDEX function like this

 

data want;
   set have;

   if index(textvar, 'Captain') > 0 then type = 'Captain';
   else if index(textvar, 'General') > 0 then type = 'General';
   else if index(textvar, 'Colonel') > 0 then type = 'Colonel';
run;
PROC Star
Posts: 547

Re: Use IF to find a string THEN newvar = string

BTW I have assumed that no observation contains both the string 'Captain' and 'General' eg Smiley Happy

Occasional Contributor
Posts: 8

Re: Use IF to find a string THEN newvar = string

Just a thought ..

 

 

Regardless of case insensitive characters , use find function with i option .

 

 if find(textvar, 'CAPTAIN',"i") > 0 then type = 'Captain';

 

data have;
input textvar $20.;
datalines;
Captain of Ship
captain of Ship
CAPTAIN india
The Captain
;
run;


data want;
set have;
if find(textvar, 'CAPTAIN',"i") > 0 then type = 'Captain';
run;

 

Super User
Posts: 5,071

Re: Use IF to find a string THEN newvar = string

Regardless of which function you choose, you would be better off switching to the version that finds words rather than strings:  FINDW or INDEXW.

 

You don't want your logic to locate strings like these:

 

generally

captaincy

 

In either case, though, you may encounter false positives such as:

 

In general, these problems were handled.

Contributor
Posts: 24

Re: Use IF to find a string THEN newvar = string

my thanks to draycut for the code and to astounding for the inedxw suggestion.

 

i'm cooking with gas now!

Super User
Posts: 10,466

Re: Use IF to find a string THEN newvar = string

You may have a potential issue for compound words with Lieutenant-General or Major-General. If you are searching for Lieutenant or Major you will not want "-" to be included as a delimiter for a word search and do want it for the search for General.

 

Since this is free form text you may have fun for your intended purpose if you only want General as a title or rank and not in context of "General orders" or similar.

 

Also Sergeant-Major is a potential if searching for Major, and Lieutenant-Colonel. If there is Navy data then Lieutenant-Commander is another issue.

 

And don't forget abbreviations such as CPT, LT, LTC, MAJ, GEN, COL, etc. since military runs on TLAs (three-letter acronyms).

Contributor
Posts: 24

Re: Use IF to find a string THEN newvar = string

Guys,

i have come across an issue with my logic (not the code above).

When i run the indexw IF statements above, it places "General", "Captain", etc in my TYPE variable, i can then count each instance of "General", "Captain", etc. However, i cant work out how to get a 0 count for any strings it does not find.
I tried this:

if indexw(userdata, 'Captain') > 0 then do; type = 'Captain'; NotFound = 0;end;
else if indexw(userdata, 'Captain') = 0 then do; type = '.'; NotFound = 'Captain';end;

but it fills the var NotFound with captain.

Can anyone point me in the general direction? (see what i did there!)

thanks.
Super User
Posts: 5,071

Re: Use IF to find a string THEN newvar = string

I'm not sure that you're picturing the right end result.  Here's one possible change:

 

type='Captain';

if indexw(userdata, 'Captain') > 0 then Found=1;

else Found=0;

 

If you're going to check the same data for "General", you will need to take extra steps.  Either output multiple observations (one for each TYPE value), or create multiple flags such as CaptainFound instead of Found.

 

Contributor
Posts: 24

Re: Use IF to find a string THEN newvar = string

[ Edited ]

"I'm not sure that you're picturing the right end result." - i think you might be right!

i dont think my original idea will work for my new criteria.

 

I get the feeling that i'm going to have to go back to the start on this one...

Contributor
Posts: 24

Re: Use IF to find a string THEN newvar = string

trying to think round my issue...

 

i have 144 string i need to search for.  At present my results table gives me a count for the 75 strings that are present in my userdata variable.  

 

Will it be possible to code something that compares the 75 strings in my results table and compare it to my list of 144 searchable strings and then adds the 69 strings that were not found to my results table with a count of 0?

Contributor
Posts: 24

Re: Use IF to find a string THEN newvar = string

[ Edited ]

Just for completeness... I managed to work this out.

I made a new data set of my 144 search strings, compared it to the results from the indexw code, which gave me the search strings which the indexw code didnt find. I then just set the count for these strings to zero. I now have a complete list of all search strings and how many times the were found.

thanks to all for their help with this one.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 311 views
  • 6 likes
  • 5 in conversation