I have a table of items that were purchased by customers listed by customer IDs. There are 300 items and 2000 customers. The table looks like:
| c_id | Item1 | Item2 | Item3 | ........ | Item300 |
| c1 | 1 | 0 | 0 | ...... | 1 |
| c2 | 0 | 1 | 1 | ....... | 0 |
| ... | ... | ... | ... | ... | ... |
| c2000 | 1 | 0 | 0 | ... | 0 |
The 0s and 1s indicate which items were purchased by certain customer. e.g. c1 purchased items 1, (may other items between 4 and 299), and 300. My goal is to get the counts of customers that have different item purchasing combinations. e.g. a result table like:
| Item_purchased | Customer_count |
| Item1; Item5; Item6; Item300 | 45 |
| Item3; Item200 | 20 |
| ... | ... |
| Item4; Item100; Item299 | 5 |
So, the total of column Customer_count should be 2000. Since there are too many items, i cannot find a easy way to find those item combinations and put them in a table.
Not sure my question is clear or not. Can anyone help me with some ideas?
Thanks,
This should get you close to the expected table:
data combined;
set have;
length item_purchased $ 1000;
array items laptop -- keyboard;
do i = 1 to dim(items);
if items[i] then do;
item_purchased = catx(';', item_purchased, vname(items[i]));
end;
end;
keep item_purchased;
run;
proc freq data=combined;
table item_purchased / nocum nofreq nopercent;
run;
EDIT: should have reloaded the topic before posting, to avoid cloning what @ballardw already posted.
To make my question clear. Consider the following small size sample with 5 items and 10 customers:
data have;
input c_id $ laptop TV cellphone monitor keyboard;
cards;
c1 1 0 0 0 1
c2 1 0 0 0 0
c3 1 0 0 0 0
c4 0 0 0 0 1
c5 1 0 0 0 1
c6 1 0 0 0 0
c7 0 0 0 0 1
c8 1 0 1 1 0
c9 1 0 1 1 0
c10 1 0 0 0 0
;
run;So the result will be similar to the table below. It can be in other format as long as I can easily get the count of customer and items purchased begin identified.
| item_purchased | Customer_count |
| laptop | 4 |
| laptop; keyboard | 2 |
| keyboard | 2 |
| laptop; cellphone; monitor | 2 |
This should get you close to the expected table:
data combined;
set have;
length item_purchased $ 1000;
array items laptop -- keyboard;
do i = 1 to dim(items);
if items[i] then do;
item_purchased = catx(';', item_purchased, vname(items[i]));
end;
end;
keep item_purchased;
run;
proc freq data=combined;
table item_purchased / nocum nofreq nopercent;
run;
EDIT: should have reloaded the topic before posting, to avoid cloning what @ballardw already posted.
Thank you! That works!
Like this?
data KEY;
length KEY $300;
set HAVE;
KEY=cats(of ITEM:);
run;
proc freq data=KEY;
table KEY;
run;
This gives you the count by unique combination.
You can parse the KEY variable to replace the value by a list of item numbers if you want.
Just for fun, what will you do if NONE of the combinations are duplicated? You have 2 to the 300th power possible combinations, that is roughly 2E90, or 2 followed by 90 zeroes, and only 2000 customers. That would make it not uncommon to have no duplicates.
Slight modification of @ChrisNZ to get the variable names into the key.
data want;
set have;
array v(*) laptop -- keyboard;
length key $10000; /* for your full job*/
do i=1 to dim(v);
if v[i]=1 then key=catx(', ',key,vname(v[i]));
end;
run;
proc freq data=want;
tables key;
run;
Vname is a function that will return the name of a variable (if used correctly).
Thank you!
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.