BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sasecn
Quartz | Level 8

I have a table of items that were purchased by customers listed by customer IDs. There are 300 items and 2000 customers. The table looks like:

c_idItem1Item2 Item3........Item300
c110   0......1
c2011.......0
..................
c2000100...0

The 0s and 1s indicate which items were purchased by certain customer. e.g. c1 purchased items 1, (may other items between 4 and 299), and 300. My goal is to get the counts of customers that have different item purchasing combinations. e.g. a result table like:

Item_purchasedCustomer_count
Item1; Item5; Item6; Item30045
Item3; Item20020
......
Item4; Item100; Item2995

 

So, the total of column Customer_count should be 2000. Since there are too many items, i cannot find a easy way to find those item combinations and put them in a table. 

Not sure my question is clear or not. Can anyone help me with some ideas?

Thanks, 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
andreas_lds
Jade | Level 19

This should get you close to the expected table:

data combined;
   set have;
   
   length item_purchased $ 1000;
   array items laptop -- keyboard;
   
   do i = 1 to dim(items);
      if items[i] then do;
         item_purchased = catx(';', item_purchased, vname(items[i]));
      end;
   end;
   
   keep item_purchased;
run;


proc freq data=combined;
   table item_purchased / nocum nofreq nopercent;
run;

 

EDIT: should have reloaded the topic before posting, to avoid cloning what @ballardw already posted.

View solution in original post

6 REPLIES 6
sasecn
Quartz | Level 8

To make my question clear. Consider the following small size sample with 5 items and 10 customers:

data have;
input c_id $ laptop TV cellphone monitor keyboard;
cards;
c1 1 0 0 0 1
c2 1 0 0 0 0
c3 1 0 0 0 0
c4 0 0 0 0 1
c5 1 0 0 0 1
c6 1 0 0 0 0
c7 0 0 0 0 1
c8 1 0 1 1 0
c9 1 0 1 1 0
c10 1 0 0 0 0
;
run;

So the result will be similar to the table below. It can be in other format as long as I can easily get the count of customer and items purchased begin identified.

item_purchasedCustomer_count
laptop4
laptop; keyboard2
keyboard2
laptop; cellphone; monitor2
andreas_lds
Jade | Level 19

This should get you close to the expected table:

data combined;
   set have;
   
   length item_purchased $ 1000;
   array items laptop -- keyboard;
   
   do i = 1 to dim(items);
      if items[i] then do;
         item_purchased = catx(';', item_purchased, vname(items[i]));
      end;
   end;
   
   keep item_purchased;
run;


proc freq data=combined;
   table item_purchased / nocum nofreq nopercent;
run;

 

EDIT: should have reloaded the topic before posting, to avoid cloning what @ballardw already posted.

sasecn
Quartz | Level 8

Thank you! That works!

ChrisNZ
Tourmaline | Level 20

Like this?

data KEY;
  length KEY $300;
  set HAVE;
  KEY=cats(of ITEM:);
run;
proc freq data=KEY;
  table KEY;
run;

This gives you the count by unique combination.

You can parse the KEY variable to replace the value by a list of item numbers if you want.

ballardw
Super User

Just for fun, what will you do if NONE of the combinations are duplicated? You have 2 to the 300th power possible combinations, that is roughly 2E90, or 2 followed by 90 zeroes, and only 2000 customers. That would make it not uncommon to have no duplicates.

 

Slight modification of @ChrisNZ to get the variable names into the key.

data want;
   set have;
   array v(*) laptop -- keyboard;
   length key $10000; /* for your full job*/
   do i=1 to dim(v);
      if v[i]=1 then key=catx(', ',key,vname(v[i]));
   end;
run;

proc freq data=want;
   tables key;
run;

Vname is a function that will return the name of a variable (if used correctly).

sasecn
Quartz | Level 8

Thank you!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 948 views
  • 0 likes
  • 4 in conversation