Desktop productivity for business analysts and programmers

Output fuzzy matching results

Reply
Contributor
Posts: 34

Output fuzzy matching results

Folks,

 

I have a dataset with a number of string variables. The actual values in these reference variables in a number of datsets. 

 

What I would like to do is search these variables for references to particular words. 

 

proc sql;
   create table alldata as
   select *
   from dictionary.columns
   where libname in ('X2011','X2012','X2013','X2014','X2015')
   order by libname, memname
   ;  
quit;


data alldat1 (keep=libname memname name type label) ;
set alldata;
value=spedis(label, ' income ');
run;

So from the example I would like to tell SAS to search the variable label for any reference to the string income.

 

I'm not sure if this is the correct function to use for this feature? 

 

Given these labels are basically sentences I'm not sure if SAS will be able to look for something such as income?

 

Any advice is welcome. 

Grand Advisor
Posts: 10,210

Re: Output fuzzy matching results

You don't actually say what you are trying to accomplish.

Since you are not keeping the value of your "value" variable I am going to guess that what your attempting is to select records where the word 'income' appears in the label. If that is the case I would start with:

data alldat1 (keep=libname memname name type label) ;
   set alldata;
   if findw(label, 'income',' ,.','i');
run;

The 'i' parameter says to ignore case so Income INCOME income all match the selection. The Findw will return a value greater than 0 if the word is found and so is considered "true" for the If and that form of IF selects records.

 

The ' ,.' says to treat those characters as word separators. If you have other punctualtion such as : that may appear then add that.

 

If that is not what you want then provide an example of what the output data set should actually look like.

Contributor
Posts: 34

Re: Output fuzzy matching results

Excellent! This is exactly the kind of thing I was looking for. 

 

Is it possible to increase the search criteria in the one data step to look for a number of different words? Something such as,

 

data alldat1 (keep=libname memname name type label) ;
   set alldata;
   if type ='char' then delete;
   if findw(label, 'income',' ,.','i'); 
   if findw(label, 'dividends',' ,.','d');
   if findw(label, 'salary',' ,.','s');
 if findw(label, 'pay',' ,.','p');
run;
Grand Advisor
Posts: 10,210

Re: Output fuzzy matching results


Sean_OConnor wrote:

Excellent! This is exactly the kind of thing I was looking for. 

 

Is it possible to increase the search criteria in the one data step to look for a number of different words? Something such as,

 

data alldat1 (keep=libname memname name type label) ;
   set alldata;
   if type ='char' then delete;
   if findw(label, 'income',' ,.','i'); 
   if findw(label, 'dividends',' ,.','d');
   if findw(label, 'salary',' ,.','s');
 if findw(label, 'pay',' ,.','p');
run;

One If Findw with OR

 

data alldat1 (keep=libname memname name type label) ;
   set alldata;
   if type ='char' then delete;
   if findw(label, 'income',' ,.','i') 
      or findw(label, 'dividends',' ,.','d')
      or findw(label, 'salary',' ,.','s')
      or findw(label, 'pay',' ,.','p');
run;

If you want any label that matches at least one of the conditions.

 

Ask a Question
Discussion stats
  • 3 replies
  • 101 views
  • 0 likes
  • 2 in conversation