Text mining and content categorization

String comparision-Jaccard distance‏

Reply
Occasional Contributor
Posts: 5

String comparision-Jaccard distance‏

I have Jaccard score in comparing two strings to check the similarity/Dissimlarity using R. 
I tried to replicate the same in SAS but couldn't achieve it.
Can you please let me know if there is function/way to get jaccard score in SAS for
comparing two strings "Krishna" and "Krishna Reddy"

I tried to replicate in SAS with proc distance but no luck.

in R
library(stringdist)
stringdist('krishna', 'krishna reddy', method='jaccard')

result is 0.3636

Specifically looking for Jaccard distance only.

Appreciate any help!
Respected Advisor
Posts: 4,756

Re: String comparision-Jaccard distance‏

What is the meaning of the Jaccard distance between strings in R? Is it based on the presence/absence of letters, words, sounds, in the strings?

SAS has many specialized functions for computing the distance between strings: COMPGED, COMPLEV, SOUNDEX, SPEDIS, as well as CALL COMPCOST.

PG
Occasional Contributor
Posts: 5

Re: String comparision-Jaccard distance‏

hi,

Here is the illustration with example
say you have two strings 'abcde', 'abdcde', I split them into double letters characters combinations including space in the order and flag the occurrence in sting v1(abcde) and string v2(abdcde).

ab bc cd de dc bd
V1 1 1 1 1 0 0
V2 1 0 1 1 1 1

v1 intersection v2=3
v1 union v2 =6 so
my score is 1 - 3 / 6 =0.5
Community Manager
Posts: 509

Re: String comparision-Jaccard distance‏

Hi Krishnam,

 

I've moved your post to the Text Analytics Community so that more experts may be able to help out.

 

Anna

Ask a Question
Discussion stats
  • 3 replies
  • 568 views
  • 0 likes
  • 3 in conversation