Solved: Re: How to create a function using an existing dataset (variable is al...

BayleyVos · Posted 10-19-2018 05:44 AM

I'm calculating the similarity between two text values (a text response and a standard dataset of responses).

For example, the text responses could be ('palms','east rand mall','sandton city mall'). The standard dataset is ('Palm Centre', 'East Rand Mall','Sandton City',etc...).

So I'm calculating the similarity between the two. Given a certain similarity criteria, it should output the value from the standard dataset). Just to clarity the text responses can be anything that someone can text - open text.

I'm calculating the similarity factor as follows:

---------------------------------------------------------------------------------------------------------------------

data new_branch_list (keep = branch);
set branch_list;

cost1 = min(250,spedis(strip(lowcase(branch)),'east rand mall'));
cost2 = min(250,spedis('east rand mall',strip(lowcase(branch))));
cost3 = max(cost1,cost2);

if cost3 > 0 then similarity = (1 - (min(cost1,cost2)/250));
else similarity = 1;

if similarity > 0.9 then output;

run;

---------------------------------------------------------------------------------------------------------------------

The branch_list mentioned in the code is the standard set of inputs.

In this example I've used the text input of 'east rand mall'.

How do I turn this into a function, for which I can input a data set column (with the text values) and then it calculates the correct standard dataset value, based on the similarity factor criteria?

I would ideally use this in a proc sql step, by calling the function.

(I'm using SAS version 7.13 HF4 (64-bit))

Thank you.

RW9 · Posted 10-19-2018 09:21 AM

I see where the confusion arises. The macro code I presented was in answer to;

"I would ideally use this in a proc sql step"

Hence why I presented it using an example proc sql step:

%macro similarity;
  min(250,spedis(strip(lowcase(branch)),'east rand mall')) as cost1,
  min(250,spedis('east rand mall',strip(lowcase(branch)))) as cost2,
  max(caculated cost1,caculated cost2) as cost3,
  ifn(caculated cost3 > 0,(1 - (min(calcualted cost1,calculated cost2)/250)),1) as similarity,
  case when calculated similarity > 0.9 then "Y" else "" end as flag
%mend similarity;

proc sql;
  select %similarity
  from...;
quit;

I did not present a datastep example, for that you would want something like your original code, bu macrotised. Maybe:

%macro similarity (out=,text=);
  data &out. (keep = branch);
    set branch_list;
    cost1 = min(250,spedis(strip(lowcase(branch)),"&text."));
    cost2 = min(250,spedis("&text.",strip(lowcase(branch))));
    cost3 = max(cost1,cost2);
    if cost3 > 0 then similarity = (1 - (min(cost1,cost2)/250));
    else similarity = 1;
    if similarity > 0.9 then output;
  run;
%mend similarity;

%similarity (out=test1,text=east rand mall);
%similarity (out=test2,text=marua mall);
...

Datastep and SQL syntax's are very different, so what works for one doesn't necessarily work for the other, and there are many ways to implement any problem.

View solution in original post

RW9 · Posted 10-19-2018 05:54 AM

Personally I would avoid "functions". You can macro-tise that code pretty easily. You could also change it so that it works in SQL, however that is different syntax to the one presented below, your macro call needs to generate valid SQL syntax. So something like:

%macro similarity;
  min(250,spedis(strip(lowcase(branch)),'east rand mall')) as cost1,
  min(250,spedis('east rand mall',strip(lowcase(branch)))) as cost2,
  max(caculated cost1,caculated cost2) as cost3,
  ifn(caculated cost3 > 0,(1 - (min(calcualted cost1,calculated cost2)/250)),1) as similarity,
  case when calculated similarity > 0.9 then "Y" else "" end as flag
%mend similarity;

proc sql;
  select %similarity
  from...;
quit;

BayleyVos · Posted 10-19-2018 06:53 AM

Thank you so much...

But how do I work in the data set that is needed for the calculations? Because I do not see it mentioned in the macro.

That is the 'branch-list' table that I mentioned in my original post?

Or do I not need to specify it?

RW9 · Posted 10-19-2018 08:30 AM

You would need to supply test data in the form of a datastep:

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

What you want out from it, and code you want to use the macro in. I can't tell you from the post you have made, that is why I just gave you an example of how to use macro to create code which would work in SQL.

BayleyVos · Posted 10-19-2018 08:38 AM

Thanks again.

I ran the following code to test it out, but I'm getting a few errors:

That I do not understand. Can you perhaps help? (It doesn't use any data that you would need).

I would really appreciate it 🙂

Thanks again.

%macro similarity(act_branch,inp_branch);
	min(250,spedis(strip(lowcase(act_branch)),inp_branch)) as cost1,
	min(250,spedis(inp_branch,strip(lowcase(act_branch)))) as cost2,
	max(caculated cost1,caculated cost2) as cost3,
	ifn(caculated cost3 > 0,(1 - (min(calcualted cost1,calculated cost2)/250)),1) as similarity,
	case when calculated similarity > 0.9 then true else false end as flag
%mend similarity;

data test;
similarity = %similarity('Maerua Mall','marua mall');
run;

RW9 · Posted 10-19-2018 09:21 AM

I see where the confusion arises. The macro code I presented was in answer to;

"I would ideally use this in a proc sql step"

Hence why I presented it using an example proc sql step:

%macro similarity;
  min(250,spedis(strip(lowcase(branch)),'east rand mall')) as cost1,
  min(250,spedis('east rand mall',strip(lowcase(branch)))) as cost2,
  max(caculated cost1,caculated cost2) as cost3,
  ifn(caculated cost3 > 0,(1 - (min(calcualted cost1,calculated cost2)/250)),1) as similarity,
  case when calculated similarity > 0.9 then "Y" else "" end as flag
%mend similarity;

proc sql;
  select %similarity
  from...;
quit;

I did not present a datastep example, for that you would want something like your original code, bu macrotised. Maybe:

%macro similarity (out=,text=);
  data &out. (keep = branch);
    set branch_list;
    cost1 = min(250,spedis(strip(lowcase(branch)),"&text."));
    cost2 = min(250,spedis("&text.",strip(lowcase(branch))));
    cost3 = max(cost1,cost2);
    if cost3 > 0 then similarity = (1 - (min(cost1,cost2)/250));
    else similarity = 1;
    if similarity > 0.9 then output;
  run;
%mend similarity;

%similarity (out=test1,text=east rand mall);
%similarity (out=test2,text=marua mall);
...

Datastep and SQL syntax's are very different, so what works for one doesn't necessarily work for the other, and there are many ways to implement any problem.

BayleyVos · Posted 10-22-2018 03:39 AM

Yes, sorry. I think I'm not explaining my problem sufficiently well - I'm very new to SAS (so please bear with me for a while).

Thank you for your input! It has greatly helped me!

BayleyVos · Posted 10-22-2018 04:09 AM

I used this code (you provided - thank you) to construct the macro:

%macro similarity (out=,text=);
  data &out. (keep = branch);
    set branch_list;
    cost1 = min(250,spedis(strip(lowcase(branch)),"&text."));
    cost2 = min(250,spedis("&text.",strip(lowcase(branch))));
    cost3 = max(cost1,cost2);
    if cost3 > 0 then similarity = (1 - (min(cost1,cost2)/250));
    else similarity = 1;
    if similarity > 0.9 then output;
  run;
%mend similarity;

Now, I want to use this macro in the following way:

proc sql;
	select	*,
		%similarity(test1, a.message) as branch_der
	from	na_pacc_responses as a;
quit;

So I'm passing the message column from the na_pass_responses table through the similarity macro - to obtain the output. However, it giving me the following error:

ERROR: More positional parameters found than defined.

I'm really sorry! But I desperately need help!

What am I doing wrong that it doesn't work?

Thanks again for all your help!!!

RW9 · Posted 10-22-2018 04:34 AM

Sorry, I am not going over it again, please re-read my posts above. SQL syntax is not the same as datastep code. Your macro needs to resolve to syntactically correct code - all macro is is a find replace. Your code resolves to:

proc sql;
 select *,
 data test1 (keep = branch);
 set branch_list;
 cost1 = min(250,spedis(strip(lowcase(branch)),"&text."));
 cost2 = min(250,spedis("a.message",strip(lowcase(branch))));
 cost3 = max(cost1,cost2);
 if cost3 > 0 then similarity = (1 - (min(cost1,cost2)/250));
 else similarity = 1;
 if similarity > 0.9 then output;
 run;
 as branch_der
 from na_pacc_responses as a;
quit;

Which is clearly wrong.

BayleyVos · Posted 10-22-2018 04:40 AM

I am really sorry. I'm just very new to SAS - so I my understanding is lacking, which I understand must be frustrating to you. I do apologise.

Thank you anyway for your time and effort.

RW9 · Posted 10-22-2018 05:02 AM

No worries. You have the two different examples, one datastep and one sql. Have a go and come back with (start a new thread) with any questions.

PGStats · Posted 10-20-2018 12:01 AM

Have you considered using function COMPLEV instead? It is faster and symmetric. It also provides a limit to the distance that it will evaluate and useful modifiers.

PG

How to create a function using an existing dataset (variable is also a data set)

Re: How to create a function using an existing dataset (variable is also a data set)

Re: How to create a function using an existing dataset (variable is also a data set)

Re: How to create a function using an existing dataset (variable is also a data set)

Re: How to create a function using an existing dataset (variable is also a data set)

Re: How to create a function using an existing dataset (variable is also a data set)

Re: How to create a function using an existing dataset (variable is also a data set)

Re: How to create a function using an existing dataset (variable is also a data set)

Re: How to create a function using an existing dataset (variable is also a data set)

Re: How to create a function using an existing dataset (variable is also a data set)

Re: How to create a function using an existing dataset (variable is also a data set)

Re: How to create a function using an existing dataset (variable is also a data set)

Re: How to create a function using an existing dataset (variable is also a data set)

Catch up on SAS Innovate 2026