Solved: Re: get combination count

sasecn · Posted 08-23-2021 09:38 PM

I have a table of items that were purchased by customers listed by customer IDs. There are 300 items and 2000 customers. The table looks like:

c_id	Item1	Item2	Item3	........	Item300
c1	1	0	0	......	1
c2	0	1	1	.......	0
...	...	...	...	...	...
c2000	1	0	0	...	0

The 0s and 1s indicate which items were purchased by certain customer. e.g. c1 purchased items 1, (may other items between 4 and 299), and 300. My goal is to get the counts of customers that have different item purchasing combinations. e.g. a result table like:

Item_purchased	Customer_count
Item1; Item5; Item6; Item300	45
Item3; Item200	20
...	...
Item4; Item100; Item299	5

So, the total of column Customer_count should be 2000. Since there are too many items, i cannot find a easy way to find those item combinations and put them in a table.

Not sure my question is clear or not. Can anyone help me with some ideas?

Thanks,

andreas_lds · Posted 08-24-2021 02:01 AM

This should get you close to the expected table:

data combined;
   set have;
   
   length item_purchased $ 1000;
   array items laptop -- keyboard;
   
   do i = 1 to dim(items);
      if items[i] then do;
         item_purchased = catx(';', item_purchased, vname(items[i]));
      end;
   end;
   
   keep item_purchased;
run;


proc freq data=combined;
   table item_purchased / nocum nofreq nopercent;
run;

EDIT: should have reloaded the topic before posting, to avoid cloning what @ballardw already posted.

View solution in original post

sasecn · Posted 08-23-2021 09:59 PM

To make my question clear. Consider the following small size sample with 5 items and 10 customers:

data have;
input c_id $ laptop TV cellphone monitor keyboard;
cards;
c1 1 0 0 0 1
c2 1 0 0 0 0
c3 1 0 0 0 0
c4 0 0 0 0 1
c5 1 0 0 0 1
c6 1 0 0 0 0
c7 0 0 0 0 1
c8 1 0 1 1 0
c9 1 0 1 1 0
c10 1 0 0 0 0
;
run;

So the result will be similar to the table below. It can be in other format as long as I can easily get the count of customer and items purchased begin identified.

item_purchased	Customer_count
laptop	4
laptop; keyboard	2
keyboard	2
laptop; cellphone; monitor	2

andreas_lds · Posted 08-24-2021 02:01 AM

This should get you close to the expected table:

data combined;
   set have;
   
   length item_purchased $ 1000;
   array items laptop -- keyboard;
   
   do i = 1 to dim(items);
      if items[i] then do;
         item_purchased = catx(';', item_purchased, vname(items[i]));
      end;
   end;
   
   keep item_purchased;
run;


proc freq data=combined;
   table item_purchased / nocum nofreq nopercent;
run;

EDIT: should have reloaded the topic before posting, to avoid cloning what @ballardw already posted.

sasecn · Posted 08-24-2021 09:25 AM

Thank you! That works!

ChrisNZ · Posted 08-24-2021 12:09 AM

Like this?

data KEY;
  length KEY $300;
  set HAVE;
  KEY=cats(of ITEM:);
run;
proc freq data=KEY;
  table KEY;
run;

This gives you the count by unique combination.

You can parse the KEY variable to replace the value by a list of item numbers if you want.

High-Performance SAS Coding - Third Edition

ballardw · Posted 08-24-2021 01:52 AM

Just for fun, what will you do if NONE of the combinations are duplicated? You have 2 to the 300th power possible combinations, that is roughly 2E90, or 2 followed by 90 zeroes, and only 2000 customers. That would make it not uncommon to have no duplicates.

Slight modification of @ChrisNZ to get the variable names into the key.

data want;
   set have;
   array v(*) laptop -- keyboard;
   length key $10000; /* for your full job*/
   do i=1 to dim(v);
      if v[i]=1 then key=catx(', ',key,vname(v[i]));
   end;
run;

proc freq data=want;
   tables key;
run;

Vname is a function that will return the name of a variable (if used correctly).

sasecn · Posted 08-24-2021 09:26 AM

Thank you!

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away