BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Marc_y
Fluorite | Level 6

Hi,

 

I am trying to calculate what the overlap is between a set of products in various baskets of goods. In total, I have 110 products across 150 million baskets. I want to end up with a table that shows the count of baskets where there is an overlap between any combination of products (including with themselves - so a 110 column x 110 row table. 

 

I think I should be using arrays to achieve this, but my knowledge in SAS is too limited to work out how exactly this is done. I have tried several different approaches, but none seem to give me the desired outputs - I seem to get stuck on reducing the baskets down to one cell. I tried using a sumproduct as the values that can exist in the basket data are 0 and 1 - thus the sumproduct would give me the number of baskets where there is overlap. 

 

What I have butchered together so far is this (which does not tie to the example data below which only looks at 4 variables):

 

DATA test;
  SET SAMPLE (DROP=BASKET_ID CUSTOMER_NO);

  ARRAY BASE BU54 -- SG1043;
  ARRAY COMPARE BU54 -- SG1043;

  ARRAY OUTPUTS BU54 -- SG1043;

  DO i=1 TO 110;
    DO p=1 TO 110;

      OUTPUTS{p} = SUM(BASE{i} + COMPARE{p});
	  IF p = 110 THEN OUTPUT;

	END;
  END;

RUN;

 

 

Here is some example data:

data have;
  input basket g1 g2 g3 g4;
datalines;
1 1 0 1 0
2 0 0 1 1
3 1 1 1 0
4 1 1 0 1
5 0 1 1 0
6 0 0 0 1
;
run;

Essentially, what I want to get to with the above example data would be something that looks like this: 

data want;
  input item $ g1 g2 g3 g4;
datalines;
g1 3 2 2 1
g2 2 3 2 1
g3 2 2 4 1
g4 1 1 1 3
;
run;

I am sorry if I have not included data in the right format - this is my first time using this community. Please let me know if there is anything I can add to make this more clear.

 

Any help would be greatly appreciated!

 

Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data have;
  input basket g1 g2 g3 g4;
datalines;
1 1 0 1 0
2 0 0 1 1
3 1 1 1 0
4 1 1 0 1
5 0 1 1 0
6 0 0 0 1
;
run;
proc corr data=have out=want(where=(_type_='SSCP'))     sscp noprint ;
var g1-g4;
run;
proc print data=want;run;

View solution in original post

6 REPLIES 6
ed_sas_member
Meteorite | Level 14

Hi @Marc_y 

 

Here is an attempt to achieve this.

 

data have;
  input basket g1 g2 g3 g4;
datalines;
1 1 0 1 0
2 0 0 1 1
3 1 1 1 0
4 1 1 0 1
5 0 1 1 0
6 0 0 0 1
;
run;

/* Retrieve the list of variables (except basket): g1 g2 g3 g4 ... in macrovariable &list_letters */

proc contents data=have out=have_contents noprint;
run;

proc sql noprint;
	select distinct name
	into: list_letters separated by " "
	from have_contents
	where name ne "basket";
quit;

/* Identify the frequency of each couple (e.g. g1-g2, ...) */

data have_tr;
	set have;
	
	array _letter(*) &list_letters;

	total = sum(of _letter(*));

	do i=1 to dim(_letter);

		do j=i to dim(_letter);

			if _letter(i)>0 and _letter(j)>0 then
				do;
					couple_1=vname(_letter(i));
					couple_2=vname(_letter(j));
					if total > 0 then do;
						couple=compress(catx("_", couple_1, couple_2));
						output;
					end;
				end;
		end;
	end;
	keep basket couple;
run;

proc freq data=have_tr;
	table couple / noprint out=have_freq (keep=couple count);
run;

/* Retrieve the list of distinct couples (g1g1, ...) in macrovariable &list_couple */

proc sql noprint;
	select distinct couple into: list_couple separated by " " from have_freq;
quit;

/* Create the matrix table */

proc transpose data=have_freq out=have_tr2 (drop=_name_ _label_);
	var count;
	ID couple;
run;

proc transpose data=have(drop=basket) out=structure (keep=_name_ rename=(_name_=V1));
	var _numeric_;
run;

data want;
	set structure;
	if _n_ = 1 then set have_tr2;
	array _matrix(*) &list_letters; /* g1 g2 g3 g4 */
	array _ref(*) &list_couple; 	/* g1_g1 g1_g2 g1_g3 g1_g4 g2_g2 g2_g3 g2_g4 g3_g3 g3_g4 g4_g4 */
	do i=1 to dim(_matrix);
		do j=1 to dim(_ref);
			if (scan(vname(_ref(j)),1,"_") = V1 
			   and scan(vname(_ref(j)),2,"_") = vname(_matrix(i)))
			  or
			   (scan(vname(_ref(j)),2,"_") = V1 
			   and scan(vname(_ref(j)),1,"_") = vname(_matrix(i)))
			   then _matrix(i) = _ref(j);
			if _matrix(i) = . then _matrix(i) = 0;
		end;
	end;
	keep V1 &list_letters;
run;

proc print;

 

Capture d’écran 2020-03-12 à 13.18.02.png

 

Best,

Marc_y
Fluorite | Level 6
Thanks a lot for this detailed answer. I tried Ksharp's answer and it works with a lot less code - so will go ahead with that.

Really appreciate the help!
Ksharp
Super User
data have;
  input basket g1 g2 g3 g4;
datalines;
1 1 0 1 0
2 0 0 1 1
3 1 1 1 0
4 1 1 0 1
5 0 1 1 0
6 0 0 0 1
;
run;
proc corr data=have out=want(where=(_type_='SSCP'))     sscp noprint ;
var g1-g4;
run;
proc print data=want;run;
Marc_y
Fluorite | Level 6
This is perfect - thank you! You have helped me a ton!
novinosrin
Tourmaline | Level 20

Hi @Marc_y  Great precise answer by @Ksharp  and a nice detailed thought out by @ed_sas_member . To my mind, a way to understand the problem as we break into pieces with your initial thought of arrays would perhaps be to devise an algorithm in steps like

 

1. Look at pairs

2. Check for even outcome

3. Roll up 

4. Transpose to wide to meet the stated requirement

 

i.e in SAS syntax, the possibility is

 



data have;
  input basket g1 g2 g3 g4;
datalines;
1 1 0 1 0
2 0 0 1 1
3 1 1 1 0
4 1 1 0 1
5 0 1 1 0
6 0 0 0 1
;
run;
/*Get/compare pairs*/
data temp;
 set have;
 array t g1-g4;
 length item $5;
 do g=1 to dim(t);
  do j=1 to dim(t);
   item=cats('g',g);
   p1=t(g);
   p2=t(j);
   if sum(p1,p2)=2 then output;
  end;
 end;
 keep item j p:;
run;
/*Roll up eligible pairs*/
proc sql;
create table temp1 as
select item,j,sum(p1) as s
from temp
group by item,j;
quit;
/*To get the wide structure*/
proc transpose data=temp1 out=want(drop=_:) prefix=g;
by item;
var s;
id j;
run;

Of course the above may require a slight adjustments to get the appropriate variable names in place as is in your original dataset. The idea is to validate the thought process/approach 🙂 

 

 

For someone who mentions " but my knowledge in SAS is too limited to work out how exactly this is done", I really do appreciate your thought process is very noteworthy

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1041 views
  • 12 likes
  • 5 in conversation