Hi, I'm trying to generate a market product penetration report to see what customers in my data who principally buy a product and also buy other products along with it. For example, below is report I want to generate and I want to establish the item penetration of a customer. In the given example, out of 20 customers who bought beer, 5 bought crisps, 3 bought fries, 1 bought soda, 1 bought mineral water and 1 bought fast food. Can anybody please suggest me a procedure or a way to generate this kind of a report?
Customers | Beer | Crisps | Fries | Soda | Mineral Water | Fast food | |
Beer | 20 | 5 | 3 | 1 | 1 | 1 | |
Crisps | 35 | 5 | 1 | 8 | 7 | 4 | |
Fries | 32 | 9 | 2 | 20 | 2 | 15 | |
Soda | 50 | 15 | 8 | 5 | 2 | 20 | |
Mineral Water | 40 | 1 | 10 | 3 | 5 | 6 | |
Fast food | 30 | 5 | 18 | 12 | 24 | 4 |
Please show what the input data looks like.
Hi, Please ignore my previous post as it was not explained properly, here are the details:
Here is my input data:
Product_code | Product_name | Allied_product | Customer_penetration |
1 | Beer | Crisps | 8872 |
1 | Beer | Fries | 337 |
1 | Beer | Soda | 25503 |
1 | Beer | Mineral water | 56915 |
2 | Crisps | Fries | 599 |
2 | Crisps | Soda | 25230 |
2 | Crisps | Mineral water | 24801 |
3 | Fries | Soda | 983 |
3 | Fries | Mineral water | 1244 |
4 | Soda | Mineral water | 98661 |
And here is my WANTED OUTPUT:
Product_code | Product_Name | Beer | Crisps | Fries | Soda | Mineral_water |
1 | Beer | 8872 | 337 | 25503 | 56915 | |
2 | Crisps | 8872 | 599 | 25230 | 24801 | |
3 | Fries | 337 | 599 | 983 | 1244 | |
4 | Soda | 25503 | 25230 | 983 | 98661 |
Can anybody help?
Hi, PROC TRANSPOSE gets you close, but TRANSPOSE will only put each value of the variable "Customer_penetration" into one cell. There are most likely shorter ways, but this works (shortened the variable names, I can't type) ...
data x;
input pc pn :$15. ap &$15. cp;
datalines;
1 Beer Crisps 8872
1 Beer Fries 337
1 Beer Soda 25503
1 Beer Mineral water 56915
2 Crisps Fries 599
2 Crisps Soda 25230
2 Crisps Mineral water 24801
3 Fries Soda 983
3 Fries Mineral water 1244
4 Soda Mineral water 98661
;
data y (keep=product_: BEER CRISPS FRIES SODA MINERAL_WATER);
array y(5,5) _temporary_;
array p(5) $15 ('BEER' 'CRISPS' 'FRIES' 'SODA' 'MINERAL WATER');
array c(5) BEER CRISPS FRIES SODA MINERAL_WATER;
set x end=last;
i = whichc(upcase(pn), of p(*)); j = whichc(upcase(ap), of p(*));
y(i,j) = cp; y(j,i) = cp;
if last then
do product_number=1 to 4;
product_name = p(product_number);
do j=1 to 5;
c(j) = y(product_number,j);
end;
output;
end;
run;
data set Y ...
product_ product_ MINERAL_
number name BEER CRISPS FRIES SODA WATER
1 BEER . 8872 337 25503 56915
2 CRISPS 8872 . 599 25230 24801
3 FRIES 337 599 . 983 1244
4 SODA 25503 25230 983 . 98661
PROC TRANSPOSE gets you close if you muck around with your data ...
data xx (drop=zz);
set x;
output;
zz = ap; ap = pn; pn = zz;
output;
run;
proc sort data=xx;
by pn;
run;
proc transpose data=xx out=yy (drop=_name_);
by pn;
id ap;
var cp;
run;
data set YY ...
Mineral_
Obs pn Crisps Fries Soda water Beer
1 Beer 8872 337 25503 56915 .
2 Crisps . 599 25230 24801 8872
3 Fries 599 . 983 1244 337
4 Mineral water 24801 1244 98661 . 56915
5 Soda 25230 983 . 98661 25503
Like this?
data HAVE(index=(PN));
input PC PN :$15. AP &$15. CP;
cards;
1 Beer Crisps 8872
1 Beer Fries 337
1 Beer Soda 25503
1 Beer Mineral water 56915
2 Crisps Fries 599
2 Crisps Soda 25230
2 Crisps Mineral water 24801
3 Fries Soda 983
3 Fries Mineral water 1244
4 Soda Mineral water 98661
run;
data X;
set HAVE
HAVE(rename=(AP=PN PN=AP)); %* get 2nd set of data for matrix bottom;
set HAVE(keep=PN PC) key=PN; %* get correct PC ;
if _IORC_ then do; _ERROR_=0; delete; end; %* delete if no PC;
run;
proc tabulate;
class PC PN AP;
var CP;
table PC=''*PN='', AP=''*CP=''*sum=''*f=8.0;
run;
Beer | Crisps | Fries | Mineral water | Soda | ||
1 | Beer | . | 8872 | 337 | 56915 | 25503 |
2 | Crisps | 8872 | . | 599 | 24801 | 25230 |
3 | Fries | 337 | 599 | . | 1244 | 983 |
4 | Soda | 25503 | 25230 | 983 | 98661 | . |
Ironic than pasting from the SAS output window yields:
Your post has been changed because invalid HTML was found in the message body. The invalid HTML has been removed. Please review the message and submit the message when you are satisfied.
Neat ideas ... index, have+have.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.