User
Posts: 1

# Calculating overlap between 2 datasets

This is my question:

For the 100 users with the most friends, how much overlap is there in the reviews of these users with the reviews of their friends? (which means: what percentage of the reviews of the friends is about the same business(es) a user has reviewed?)  Make a nice plot to visualize this. The result should be a percentage for the top 100 users with the most friends.

I have two data sets that are given:

review data set with: user_id review_id business_id (multiple observations for one user) review_count

user data set with: user_id friends (list of friends separated by ,) number_friends

Thank you so much!!

Super User
Posts: 23,776

## Re: Calculating overlap between 2 datasets

bastrid wrote:

Thank you so much!!

What are you stuck with and what part do you need help with?

The visualization, figuring out the overlap, getting your data into SAS....I'm not going to do your homework so your request for help needs to be more specific and at least show an attempt of what you've done to solve the problem.

For starters, simplify it to something where you can show us the data and what you want as output and what you've tried so far.

Good Luck.

bastrid wrote:

This is my question:

For the 100 users with the most friends, how much overlap is there in the reviews of these users with the reviews of their friends? (which means: what percentage of the reviews of the friends is about the same business(es) a user has reviewed?)  Make a nice plot to visualize this. The result should be a percentage for the top 100 users with the most friends.

I have two data sets that are given:

review data set with: user_id review_id business_id (multiple observations for one user) review_count

user data set with: user_id friends (list of friends separated by ,) number_friends