BookmarkSubscribeRSS Feed
Kalai2008
Pyrite | Level 9


Hi,

 

I was tasked to convert SAS programs to Hive queries in EAP. Since I am new to hive codes, I am not sure what is the equivalent code for the below SAS function in hive.

utl_match.jaro_winkler_similarity (REPLACE (UPPER(name1), ' ', ''), REPLACE (UPPER(name2), ' ', '')) AS j_score,

utl_match.edit_distance_similarity (REPLACE (UPPER(name1), ' ', ''), REPLACE (UPPER(name2), ' ', '')) AS e_d_score

 

I would appreciate if someone help me to provide the equivalent sas function code in Hive.

Thanks for checking.

1 REPLY 1
smantha
Lapis Lazuli | Level 10

Unfortunately there are no exact equivalents in SAS. The closest that come to edit distances are compged and speeds functions. You have to understand that the main difference in edit distances functions is the weight that is given to operations. Unless you have SAS text miner or SAS Viya. If you have SAS Viya you can leverage python and py_stringmatching package. (AnHai Doan, Alon Halevy, Zachary Ives, “Principles of Data Integration”, Morgan Kaufmann, 2012. Chapter 4 “String Matching” (available on the package’s homepage).). You can develop these functions by yourself in SAS if you have time and patience to write your own algorithm. 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 1003 views
  • 0 likes
  • 2 in conversation