DATA Step, Macro, Functions and more

sas string exact and not exact matching

Reply
New Contributor
Posts: 2

sas string exact and not exact matching

HI i have a base dataset as bigdata.

which contains some datalines like.

sumit chohan

sumeet chauhan

sumit chouhan

pratik dahibhat

prateek dahibat

partik dahibhat

 

and sample dataset which contains

 

summit chauhan

prateek dahibhat

 

i just wanted to find out the count of the possible matches as well as want to extract the possible matches from base dataset which is in  our case bigdata.

Please suggest if any.

thanks.

 

Prashant.

 

Super User
Super User
Posts: 7,407

Re: sas string exact and not exact matching

Well, there's several ways you could do this, probably the way I would do it is:

data _null_;
  set small_data end=last;
  if _n_=1 then call execute('data want end=last;  set bigdata; retain count;');
  call execute('if index(variable,"',snippet,'") > 0 then count=sum(count,1);');
  if last then call execute(' if last then output; run;');
run;

This assumes that you have small_data which contains a variable snippet, and a dataset bigdata with a variable called variable.  This will generate a datastep with an if statement for each row of your small data, and output one row with the total.  As you want the snippets maybe also add:

data _null_;
  set small_data end=last;
  if _n_=1 then call execute('data want end=last matches;  set bigdata; retain count;');
  call execute('if index(variable,"',snippet,'") > 0 then do; count=sum(count,1); ouput matches; end;');
  if last then call execute(' if last then output want; run;');
run;

You could also do the same via merging the two datasets.  Also depends on how your data looks, does casing match, how good is the match etc.

New Contributor
Posts: 2

Re: sas string exact and not exact matching

Thank you so much for your reply...

Contributor
Posts: 44

Re: sas string exact and not exact matching

Hi ,

 

you can try Sounds like operator, anyhow it's designed for english so there may be somme difficulties with indian names.

 

data have;
input name $ 1-32;
cards;
sumit chohan               
sumeet chauhan             
sumit chouhan              
pratik dahibhat            
prateek dahibat            
partik dahibhat            
;

data sample;
input name $ 1-32;
cards;
summit chauhan             
prateek dahibhat           
;
run;

proc sql;
create table want as
select h.name as have_name, s.name as sample_name
from   have h,
       sample s
where  s.name = *h.name;
quit;

Ask a Question
Discussion stats
  • 3 replies
  • 289 views
  • 0 likes
  • 3 in conversation