Matching observations based on similarity of categorical variables

Occasional Contributor
Posts: 7

Matching observations based on similarity of categorical variables

Hey there,

I was wondering, if someone has a good way how to match two observations based on categorical (non-ordinal) variables.

The exercise I am working on is matching mentees with mentors based on interests and other characteristics that are (non-ordinal or ordinal) categorical variables.


Possible Values


“Baseball”, “Football”, “Basketball” (…)

Marital Status

“Single, no kids”, “Single, young kids”, “Married, no kids”, “Married, young kids”, (…)

Job Level

1, 2, 3, 4, 5, 6


“Retail”, “Finance”, “Wholesale”, (…)

There are also indicators if any of the variables is important to the person. I understand, I could force marital status into one or two ordinal variables like (“Single”, “Married”, “Widow”) and (“no kids”, “young kids”, “grown kids”). But I don’t know how to handle industry and sport as there is no logical order to them. My plan was originally to use a clustering technique to find a match between the mentor and the mentee set based on the shortest distance or the given points. But that would ignore the fact that people could decide, if the variable is important to them or not (“Yes”, “No”).

Now, I am thinking to just brute force logic on it by using nested IF statements that check, if there is a perfect match based on the importance and the actual values. ELSE check if there is a matching record that has all matches, but one category etc. This seems very inefficient, so I was hoping if someone came across a similar problem, I would find a better way how to handle this.

Best regards,

Super User
Posts: 9,867

Re: Matching observations based on similarity of categorical variables

What does your data look like ? what are you going to achieve ? Post an example is a far better than saying tons of words.

Ask a Question
Discussion stats
  • 1 reply
  • 2 in conversation