Hi mkeintz.
The problem is shown as below.
I have a dataset of eight variables, firm ID, year, treat, a, b, c, d, e. For each firm in each year, there are five firm characteristics a - e. Treat is dummy variable taking 1 if a firm is a treated firm and 0 if a firm is a controlled firm. For each year there is only ONE treated firm, and multiple controlled firms.
Now for each year each treated firm, I'd like to find a control firm that is the most similar to the treated firm based on a-e. That's why I want to calculate Euclidean distance between treated firm and other control firms.
The challenge here is that not all treated/controlled firms have full values for a -e. My idea is to use all available information for each treated firms to calculate distance. For example, if treated firm 001 has all values on a, b, c, d, e, then I would like to calculate distance between 001 and all controlled firms that have all a-e information in a particular year. if treated firm 002 has only values on a, b, c, then I would like to calculate distance between treated 002 and controlled firms that have available information on a, b, c.
Do you have any thoughts on this? Thanks.
... View more