BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Krishnam
Calcite | Level 5
I have Jaccard score in comparing two strings to check the similarity/Dissimlarity using R. 
I tried to replicate the same in SAS but couldn't achieve it.
Can you please let me know if there is function/way to get jaccard score in SAS for
comparing two strings "Krishna" and "Krishna Reddy"

I tried to replicate in SAS with proc distance but no luck.

in R
library(stringdist)
stringdist('krishna', 'krishna reddy', method='jaccard')

result is 0.3636

 

1 ACCEPTED SOLUTION

Accepted Solutions
FriedEgg
SAS Employee
%macro kshingling
(string
,k=5
,out=&sysmacroname.
)
;

data &out.;
   string = strip(prxchange('s#\s# #',-1,symget('string')));
   do _n_ = 1 to lengthn(string)-&k.+1;
      ngram = substr(string,_n_,&k.);
	  output;
   end;
run;

%mend;

%macro jaccard
(string1
,string2
)
;

%kshingling(&string1.,k=2,out=s1)
%kshingling(&string2.,k=2,out=s2)

proc append base=s1 data=s2; run;

proc freq data=s1 noprint;
   tables string*ngram / out=s2;
run;

proc transpose data=s2 out=s1(drop=_name_ _label_); 
by string notsorted;
var count;
id ngram;
run;

proc stdize data=s1 out=s2 missing=0 reponly;
var _numeric_;
run;

proc distance data=s2 method=jaccard absent=0 out=s1; 
var anominal(_numeric_);
id string;
run;

proc sql;
select &string1. as jaccard
  into :jaccard
  from s1
 where string="&string2.";
quit;
%mend;

%jaccard(krishna,krishna reddy);run;

This is put together quickly.  It does not match the results from the R package for your example, but it does match most other Jaccard Simmillarity Metrics I have used.  You can adjust the value of k to get different values.  I beleive setting to k=5 will give you approx the result in R (0.333....)

View solution in original post

4 REPLIES 4
ballardw
Super User

I don't find a quick way to get a Jaccard score but SAS has two functions related to edit distance COMPGED and COMPLEV that may work for your purpose.

data _null_;
   length x y $ 50;
   x = 'krishna';
   y = 'krishna reddy';
   compg = compged(x,y); 
   compl = complev(x,y);
   put compg= compl=;
run;

The additional function Call Compcost can be used to assign different weights to operations used in COMPGED.

 

Krishnam
Calcite | Level 5
Thanks! I am aware of these levenshtein distance functions.

I am specifically looking for Jaccard to achieve the mentioned example through SAS.
FriedEgg
SAS Employee
%macro kshingling
(string
,k=5
,out=&sysmacroname.
)
;

data &out.;
   string = strip(prxchange('s#\s# #',-1,symget('string')));
   do _n_ = 1 to lengthn(string)-&k.+1;
      ngram = substr(string,_n_,&k.);
	  output;
   end;
run;

%mend;

%macro jaccard
(string1
,string2
)
;

%kshingling(&string1.,k=2,out=s1)
%kshingling(&string2.,k=2,out=s2)

proc append base=s1 data=s2; run;

proc freq data=s1 noprint;
   tables string*ngram / out=s2;
run;

proc transpose data=s2 out=s1(drop=_name_ _label_); 
by string notsorted;
var count;
id ngram;
run;

proc stdize data=s1 out=s2 missing=0 reponly;
var _numeric_;
run;

proc distance data=s2 method=jaccard absent=0 out=s1; 
var anominal(_numeric_);
id string;
run;

proc sql;
select &string1. as jaccard
  into :jaccard
  from s1
 where string="&string2.";
quit;
%mend;

%jaccard(krishna,krishna reddy);run;

This is put together quickly.  It does not match the results from the R package for your example, but it does match most other Jaccard Simmillarity Metrics I have used.  You can adjust the value of k to get different values.  I beleive setting to k=5 will give you approx the result in R (0.333....)

Krishnam
Calcite | Level 5

Thank you!

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 2760 views
  • 0 likes
  • 3 in conversation