I'm calculating the similarity between two text values (a text response and a standard dataset of responses). For example, the text responses could be ('palms','east rand mall','sandton city mall'). The standard dataset is ('Palm Centre', 'East Rand Mall','Sandton City',etc...). So I'm calculating the similarity between the two. Given a certain similarity criteria, it should output the value from the standard dataset). Just to clarity the text responses can be anything that someone can text - open text. I'm calculating the similarity factor as follows: --------------------------------------------------------------------------------------------------------------------- data new_branch_list (keep = branch);
set branch_list;
cost1 = min(250,spedis(strip(lowcase(branch)),'east rand mall'));
cost2 = min(250,spedis('east rand mall',strip(lowcase(branch))));
cost3 = max(cost1,cost2);
if cost3 > 0 then similarity = (1 - (min(cost1,cost2)/250));
else similarity = 1;
if similarity > 0.9 then output;
run; --------------------------------------------------------------------------------------------------------------------- The branch_list mentioned in the code is the standard set of inputs. In this example I've used the text input of 'east rand mall'. How do I turn this into a function, for which I can input a data set column (with the text values) and then it calculates the correct standard dataset value, based on the similarity factor criteria? I would ideally use this in a proc sql step, by calling the function. (I'm using SAS version 7.13 HF4 (64-bit)) Thank you.
... View more