BookmarkSubscribeRSS Feed
prashantchitta4
Calcite | Level 5

HI i have a base dataset as bigdata.

which contains some datalines like.

sumit chohan

sumeet chauhan

sumit chouhan

pratik dahibhat

prateek dahibat

partik dahibhat

 

and sample dataset which contains

 

summit chauhan

prateek dahibhat

 

i just wanted to find out the count of the possible matches as well as want to extract the possible matches from base dataset which is in  our case bigdata.

Please suggest if any.

thanks.

 

Prashant.

 

3 REPLIES 3
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, there's several ways you could do this, probably the way I would do it is:

data _null_;
  set small_data end=last;
  if _n_=1 then call execute('data want end=last;  set bigdata; retain count;');
  call execute('if index(variable,"',snippet,'") > 0 then count=sum(count,1);');
  if last then call execute(' if last then output; run;');
run;

This assumes that you have small_data which contains a variable snippet, and a dataset bigdata with a variable called variable.  This will generate a datastep with an if statement for each row of your small data, and output one row with the total.  As you want the snippets maybe also add:

data _null_;
  set small_data end=last;
  if _n_=1 then call execute('data want end=last matches;  set bigdata; retain count;');
  call execute('if index(variable,"',snippet,'") > 0 then do; count=sum(count,1); ouput matches; end;');
  if last then call execute(' if last then output want; run;');
run;

You could also do the same via merging the two datasets.  Also depends on how your data looks, does casing match, how good is the match etc.

prashantchitta4
Calcite | Level 5

Thank you so much for your reply...

AskoLötjönen
Quartz | Level 8

Hi ,

 

you can try Sounds like operator, anyhow it's designed for english so there may be somme difficulties with indian names.

 

data have;
input name $ 1-32;
cards;
sumit chohan               
sumeet chauhan             
sumit chouhan              
pratik dahibhat            
prateek dahibat            
partik dahibhat            
;

data sample;
input name $ 1-32;
cards;
summit chauhan             
prateek dahibhat           
;
run;

proc sql;
create table want as
select h.name as have_name, s.name as sample_name
from   have h,
       sample s
where  s.name = *h.name;
quit;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 2243 views
  • 0 likes
  • 3 in conversation