Hi, I am trying to calculate what the overlap is between a set of products in various baskets of goods. In total, I have 110 products across 150 million baskets. I want to end up with a table that shows the count of baskets where there is an overlap between any combination of products (including with themselves - so a 110 column x 110 row table. I think I should be using arrays to achieve this, but my knowledge in SAS is too limited to work out how exactly this is done. I have tried several different approaches, but none seem to give me the desired outputs - I seem to get stuck on reducing the baskets down to one cell. I tried using a sumproduct as the values that can exist in the basket data are 0 and 1 - thus the sumproduct would give me the number of baskets where there is overlap. What I have butchered together so far is this (which does not tie to the example data below which only looks at 4 variables): DATA test;
SET SAMPLE (DROP=BASKET_ID CUSTOMER_NO);
ARRAY BASE BU54 -- SG1043;
ARRAY COMPARE BU54 -- SG1043;
ARRAY OUTPUTS BU54 -- SG1043;
DO i=1 TO 110;
DO p=1 TO 110;
OUTPUTS{p} = SUM(BASE{i} + COMPARE{p});
IF p = 110 THEN OUTPUT;
END;
END;
RUN; Here is some example data: data have;
input basket g1 g2 g3 g4;
datalines;
1 1 0 1 0
2 0 0 1 1
3 1 1 1 0
4 1 1 0 1
5 0 1 1 0
6 0 0 0 1
;
run; Essentially, what I want to get to with the above example data would be something that looks like this: data want;
input item $ g1 g2 g3 g4;
datalines;
g1 3 2 2 1
g2 2 3 2 1
g3 2 2 4 1
g4 1 1 1 3
;
run; I am sorry if I have not included data in the right format - this is my first time using this community. Please let me know if there is anything I can add to make this more clear. Any help would be greatly appreciated! Thank you.
... View more