Not applicable
Posts: 0

Finding closest matches

I have a dataset with some firms identified as "targets." I would like to select firms that are within +/- 30% of two parameters (size and ROA) of the targets and somehow flag these firms (other targets can be included in the flagged firms). Ultimately I would like to take the average size of these flagged firms relative to the respective target. Does this make sense? I do not know where to start!

Thanks for any direction you can provide.
Super User
Posts: 3,920

Re: Finding closest matches

I suggest you consider flagging each of your targets for size and ROA separately. I find using boolean (0 or 1) flags are very useful:

target_size = 123;
target_ROA = 456;
flag_size = (abs(size - target_size)/target_size LE 0.3);
flag_ROA = (abs(ROA - target_ROA)/target_ROA LE 0.3);

This logic creates flags that will contain 1 if they are within 30%, 0 if they are not.
Not applicable
Posts: 0

Re: Finding closest matches

Thank you for your response! The only issue I see with that solution is that it is only good for one target at a time - I have about 100 targets within the dataset (and targets can be in the list of matches for other targets). Is there a way to loop that logic so that I do not have to separately identify the matches? Thanks!!
Posts: 2,125

Re: Finding closest matches

EC,

There is a broad literature on this. Search, for example, for "nearest match" in the forums and you will find
It has links to lots of techniques to use.

Doc Muhlbaier
Duke
Super User
Posts: 3,920

Re: Finding closest matches

Perhaps I should have been more explicit:

data targets;
set xxx;
target_size = 123;
target_ROA = 456;
flag_size = (abs(size - target_size)/target_size LE 0.3);
flag_ROA = (abs(ROA - target_ROA)/target_ROA LE 0.3);
run;

This program will flag as many "targets" as there are rows in your input data - hundreds or more. Each of your rows will have two flags, one for size and one for ROA. To select rows meeting your size target: