BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Krishnam
Calcite | Level 5
I have Jaccard score in comparing two strings to check the similarity/Dissimlarity using R. 
I tried to replicate the same in SAS but couldn't achieve it.
Can you please let me know if there is function/way to get jaccard score in SAS for
comparing two strings "Krishna" and "Krishna Reddy"

I tried to replicate in SAS with proc distance but no luck.

in R
library(stringdist)
stringdist('krishna', 'krishna reddy', method='jaccard')

result is 0.3636

 

1 ACCEPTED SOLUTION

Accepted Solutions
FriedEgg
SAS Employee
%macro kshingling
(string
,k=5
,out=&sysmacroname.
)
;

data &out.;
   string = strip(prxchange('s#\s# #',-1,symget('string')));
   do _n_ = 1 to lengthn(string)-&k.+1;
      ngram = substr(string,_n_,&k.);
	  output;
   end;
run;

%mend;

%macro jaccard
(string1
,string2
)
;

%kshingling(&string1.,k=2,out=s1)
%kshingling(&string2.,k=2,out=s2)

proc append base=s1 data=s2; run;

proc freq data=s1 noprint;
   tables string*ngram / out=s2;
run;

proc transpose data=s2 out=s1(drop=_name_ _label_); 
by string notsorted;
var count;
id ngram;
run;

proc stdize data=s1 out=s2 missing=0 reponly;
var _numeric_;
run;

proc distance data=s2 method=jaccard absent=0 out=s1; 
var anominal(_numeric_);
id string;
run;

proc sql;
select &string1. as jaccard
  into :jaccard
  from s1
 where string="&string2.";
quit;
%mend;

%jaccard(krishna,krishna reddy);run;

This is put together quickly.  It does not match the results from the R package for your example, but it does match most other Jaccard Simmillarity Metrics I have used.  You can adjust the value of k to get different values.  I beleive setting to k=5 will give you approx the result in R (0.333....)

View solution in original post

4 REPLIES 4
ballardw
Super User

I don't find a quick way to get a Jaccard score but SAS has two functions related to edit distance COMPGED and COMPLEV that may work for your purpose.

data _null_;
   length x y $ 50;
   x = 'krishna';
   y = 'krishna reddy';
   compg = compged(x,y); 
   compl = complev(x,y);
   put compg= compl=;
run;

The additional function Call Compcost can be used to assign different weights to operations used in COMPGED.

 

Krishnam
Calcite | Level 5
Thanks! I am aware of these levenshtein distance functions.

I am specifically looking for Jaccard to achieve the mentioned example through SAS.
FriedEgg
SAS Employee
%macro kshingling
(string
,k=5
,out=&sysmacroname.
)
;

data &out.;
   string = strip(prxchange('s#\s# #',-1,symget('string')));
   do _n_ = 1 to lengthn(string)-&k.+1;
      ngram = substr(string,_n_,&k.);
	  output;
   end;
run;

%mend;

%macro jaccard
(string1
,string2
)
;

%kshingling(&string1.,k=2,out=s1)
%kshingling(&string2.,k=2,out=s2)

proc append base=s1 data=s2; run;

proc freq data=s1 noprint;
   tables string*ngram / out=s2;
run;

proc transpose data=s2 out=s1(drop=_name_ _label_); 
by string notsorted;
var count;
id ngram;
run;

proc stdize data=s1 out=s2 missing=0 reponly;
var _numeric_;
run;

proc distance data=s2 method=jaccard absent=0 out=s1; 
var anominal(_numeric_);
id string;
run;

proc sql;
select &string1. as jaccard
  into :jaccard
  from s1
 where string="&string2.";
quit;
%mend;

%jaccard(krishna,krishna reddy);run;

This is put together quickly.  It does not match the results from the R package for your example, but it does match most other Jaccard Simmillarity Metrics I have used.  You can adjust the value of k to get different values.  I beleive setting to k=5 will give you approx the result in R (0.333....)

Krishnam
Calcite | Level 5

Thank you!

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 2629 views
  • 0 likes
  • 3 in conversation