BookmarkSubscribeRSS Feed
Krishnam
Calcite | Level 5
I have Jaccard score in comparing two strings to check the similarity/Dissimlarity using R. 
I tried to replicate the same in SAS but couldn't achieve it.
Can you please let me know if there is function/way to get jaccard score in SAS for
comparing two strings "Krishna" and "Krishna Reddy"

I tried to replicate in SAS with proc distance but no luck.

in R
library(stringdist)
stringdist('krishna', 'krishna reddy', method='jaccard')

result is 0.3636

Specifically looking for Jaccard distance only.

Appreciate any help!
3 REPLIES 3
PGStats
Opal | Level 21

What is the meaning of the Jaccard distance between strings in R? Is it based on the presence/absence of letters, words, sounds, in the strings?

SAS has many specialized functions for computing the distance between strings: COMPGED, COMPLEV, SOUNDEX, SPEDIS, as well as CALL COMPCOST.

PG
Krishnam
Calcite | Level 5
hi,

Here is the illustration with example
say you have two strings 'abcde', 'abdcde', I split them into double letters characters combinations including space in the order and flag the occurrence in sting v1(abcde) and string v2(abdcde).

ab bc cd de dc bd
V1 1 1 1 1 0 0
V2 1 0 1 1 1 1

v1 intersection v2=3
v1 union v2 =6 so
my score is 1 - 3 / 6 =0.5
AnnaBrown
Community Manager

Hi Krishnam,

 

I've moved your post to the Text Analytics Community so that more experts may be able to help out.

 

Anna

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1568 views
  • 0 likes
  • 3 in conversation