I have a person-level dataset that has the individual's preferred store (1,2, or 3), and binary indicators if the individual likes eating apples, oranges, bananas, pineapple:
data have_person;
input id store apple orange banana pineapple;
cards;
1 2 1 0 1 1
2 3 1 1 1 1
3 1 0 1 1 1
4 3 0 1 0 1
5 2 1 1 1 1
6 1 0 0 1 1
;
I also have a store-level dataset that tells me if the store SELLS apples, oranges, bananas, or pineapples. (Note that individuals' preferred store may NOT sell the fruit that they like to eat (example: person 1 prefers store 2 and likes eating apples, bananas, pineapples. However, store 2 does not sell bananas or pineapples (only apples and oranges). Also, stores may sell fruit that people who shop at the store do not like eating.)
data have_store;
input store apple orange banana pineapple;
cards;
1 1 1 1 0
2 1 1 0 0
3 0 0 1 1
;
So what I need to calculate using data from these 2 datasets is: a specific proportion for EACH pairwise combination of fruit. There are two specific details that need to be taken into account for the calculation:
The pairwise combination proportions that need calculations need to be specifically based on the fruit that is offered by the store (i.e., from the store level dataset). So for store 1, I would have 6 pairwise calculations: apple_orange, apple_banana, orange_banana, orange_apple banana_apple banana_orange. Store 2 would have 2 pairwise calculations: apple_orange, orange_apple And so on...
So now that the pairs are defined, this is how the calculation is carried out (this is where data from both files needs to be used). The proportion is the: (# of individuals who GO TO THE SPECIFIC STORE AND like eating BOTH options available at the store) / (# of individuals who GO TO THE SPECIFIC STORE AND like eating the FIRST pairwise option available at the store). So for example, at store 1, the calculation for apple_orange is: (# of individuals who GO TO STORE 1 and like eating apples AND oranges) / (# of individuals who GO TO STORE 1 and like eating apples (they can also like other fruit))
I need a calculation for all the pairwise combinations, but the trick is that the pairwise combinations that need calculations are defined by the STORE dataset and not the person-level dataset (individuals could like eating fruit that is unavailable at the store, and I don't care about that).
This is what the resulting dataset needs to look like:
Store Combination Numerator Denominator Proportion 1 apple_orange 0 0 0 1 apple_banana 0 0 0 1 orange_banana 1 1 1 1 orange_apple 0 1 0 1 banana_apple 0 2 0 1 banana_orange 1 2 0.5 2 apple_orange 1 2 0.5 2 orange_apple 1 1 1 3 banana_pineapple 1 1 1 3 pineapple_banana 1 2 0.5
If it makes it easier to rename the fruit to fruit1, fruit2, fruit3, fruit4, that's fine. I think this needs some type of sql statement, but I'm honestly just not familiar with sql enough to figure out all the details that this type of dataset requires.
Any help is greatly appreciated!!
... View more