BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sasecn
Quartz | Level 8

I have a table of items that were purchased by customers listed by customer IDs. There are 300 items and 2000 customers. The table looks like:

c_idItem1Item2 Item3........Item300
c110   0......1
c2011.......0
..................
c2000100...0

The 0s and 1s indicate which items were purchased by certain customer. e.g. c1 purchased items 1, (may other items between 4 and 299), and 300. My goal is to get the counts of customers that have different item purchasing combinations. e.g. a result table like:

Item_purchasedCustomer_count
Item1; Item5; Item6; Item30045
Item3; Item20020
......
Item4; Item100; Item2995

 

So, the total of column Customer_count should be 2000. Since there are too many items, i cannot find a easy way to find those item combinations and put them in a table. 

Not sure my question is clear or not. Can anyone help me with some ideas?

Thanks, 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
andreas_lds
Jade | Level 19

This should get you close to the expected table:

data combined;
   set have;
   
   length item_purchased $ 1000;
   array items laptop -- keyboard;
   
   do i = 1 to dim(items);
      if items[i] then do;
         item_purchased = catx(';', item_purchased, vname(items[i]));
      end;
   end;
   
   keep item_purchased;
run;


proc freq data=combined;
   table item_purchased / nocum nofreq nopercent;
run;

 

EDIT: should have reloaded the topic before posting, to avoid cloning what @ballardw already posted.

View solution in original post

6 REPLIES 6
sasecn
Quartz | Level 8

To make my question clear. Consider the following small size sample with 5 items and 10 customers:

data have;
input c_id $ laptop TV cellphone monitor keyboard;
cards;
c1 1 0 0 0 1
c2 1 0 0 0 0
c3 1 0 0 0 0
c4 0 0 0 0 1
c5 1 0 0 0 1
c6 1 0 0 0 0
c7 0 0 0 0 1
c8 1 0 1 1 0
c9 1 0 1 1 0
c10 1 0 0 0 0
;
run;

So the result will be similar to the table below. It can be in other format as long as I can easily get the count of customer and items purchased begin identified.

item_purchasedCustomer_count
laptop4
laptop; keyboard2
keyboard2
laptop; cellphone; monitor2
andreas_lds
Jade | Level 19

This should get you close to the expected table:

data combined;
   set have;
   
   length item_purchased $ 1000;
   array items laptop -- keyboard;
   
   do i = 1 to dim(items);
      if items[i] then do;
         item_purchased = catx(';', item_purchased, vname(items[i]));
      end;
   end;
   
   keep item_purchased;
run;


proc freq data=combined;
   table item_purchased / nocum nofreq nopercent;
run;

 

EDIT: should have reloaded the topic before posting, to avoid cloning what @ballardw already posted.

sasecn
Quartz | Level 8

Thank you! That works!

ChrisNZ
Tourmaline | Level 20

Like this?

data KEY;
  length KEY $300;
  set HAVE;
  KEY=cats(of ITEM:);
run;
proc freq data=KEY;
  table KEY;
run;

This gives you the count by unique combination.

You can parse the KEY variable to replace the value by a list of item numbers if you want.

ballardw
Super User

Just for fun, what will you do if NONE of the combinations are duplicated? You have 2 to the 300th power possible combinations, that is roughly 2E90, or 2 followed by 90 zeroes, and only 2000 customers. That would make it not uncommon to have no duplicates.

 

Slight modification of @ChrisNZ to get the variable names into the key.

data want;
   set have;
   array v(*) laptop -- keyboard;
   length key $10000; /* for your full job*/
   do i=1 to dim(v);
      if v[i]=1 then key=catx(', ',key,vname(v[i]));
   end;
run;

proc freq data=want;
   tables key;
run;

Vname is a function that will return the name of a variable (if used correctly).

sasecn
Quartz | Level 8

Thank you!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2031 views
  • 0 likes
  • 4 in conversation