I have the below dataset at a customer level-
data have;
infile datalines;
input CUSTOMER_ID : $1. Channel_R : $15. Channel_O : $15. Channel_W : $15. Brand_Brand1 : $15. Brand_Brand2 : $15.;
datalines;
1 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
2 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
3 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
4 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
5 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
6 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
7 $25.01-$50 $0.01-$25 $100.01-$150 $150.01-$200 $200+
8 <$0 $25.01-$50 $25.01-$50 $100.01-$150 $50.01-$75
9 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
10 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
;
run;
I need the output as follows-
data want;
infile datalines;
input Amount : $15. Customer_Count : $5. Brand : $15. Channel : $10.;
datalines;
Missing 100 Brand1 W
<$0 200 Brand1 W
$.01-$25 300 Brand1 W
$25.01-$50 400 Brand1 W
$50.01-$75 500 Brand1 W
$75.01-$100 600 Brand1 W
$100.01-$150 700 Brand1 W
$150.01-$200 800 Brand1 W
$200+ 900 Brand1 W
Missing 100 Brand2 R
<$0 200 Brand2 R
$0.01-$25 300 Brand2 R
$25.01-$50 400 Brand2 R
$50.01-$75 500 Brand2 R
$75.01-$100 600 Brand2 R
$100.01-$150 700 Brand2 R
$150.01-$200 800 Brand2 R
$200+ 900 Brand2 R
;
run;
I tried using PROC Transpose but since the dataset is very large, it takes a huge amount of time..
Sorry, only meant the Have dataset as a datastep. Anyways, this goes someway to what you want, not sure how you get the customer_count, as it doesn't match the test data you provide, maybe clarify.
data have; infile datalines; input CUSTOMER_ID : $1. Channel_R : $15. Channel_O : $15. Channel_W : $15. Brand_Brand1 : $15. Brand_Brand2 : $15.; datalines; 1 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+ 2 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75 3 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+ 4 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75 5 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+ 6 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75 7 $25.01-$50 $0.01-$25 $100.01-$150 $150.01-$200 $200+ 8 <$0 $25.01-$50 $25.01-$50 $100.01-$150 $50.01-$75 9 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+ 10 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75 ; run; data inter (keep=customer_id amount channel brand); set have; length amount channel brand $200; array channels{3} channel_r--channel_w; array brands{2} brand_brand1 brand_brand2; do j=1 to 3; amount=channels{j}; channel=vname(channels{j}); do i=1 to 2; brand=vname(brands{i}); output; end; end; run; proc sql; create table WANT as select AMOUNT, count(distinct CUSTOMER_ID) as CUSTOMER_COUNT, CHANNEL, BRAND from INTER group by AMOUNT, CHANNEL, BRAND; quit;
Post test data in the form of a datastep!!
As such I am not typing that in. If you have a large amount of data, it will take a large amount of time, irrespective of what code you use. Although from your output want data I don't see how transpose could get your there unless you have multiple procs. A datastep, which loops of an array of the variables and outputs a row for each item, then left join a count onto that, may be a bit quicker.
@RW9: Sorry. Below my restructured question. It's a bit challenging for me hence I was looking for a proper code:
data have;
infile datalines;
input CUSTOMER_ID : $1. Channel_R : $15. Channel_O : $15. Channel_W : $15. Brand_Brand1 : $15. Brand_Brand2 : $15.;
datalines;
1 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
2 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
3 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
4 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
5 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
6 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
7 $25.01-$50 $0.01-$25 $100.01-$150 $150.01-$200 $200+
8 <$0 $25.01-$50 $25.01-$50 $100.01-$150 $50.01-$75
9 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
10 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
;
run;
data want;
infile datalines;
input Amount : $15. Customer_Count : $5. Brand : $15. Channel : $10.;
datalines;
Missing 100 Brand1 W
<$0 200 Brand1 W
$.01-$25 300 Brand1 W
$25.01-$50 400 Brand1 W
$50.01-$75 500 Brand1 W
$75.01-$100 600 Brand1 W
$100.01-$150 700 Brand1 W
$150.01-$200 800 Brand1 W
$200+ 900 Brand1 W
Missing 100 Brand2 R
<$0 200 Brand2 R
$0.01-$25 300 Brand2 R
$25.01-$50 400 Brand2 R
$50.01-$75 500 Brand2 R
$75.01-$100 600 Brand2 R
$100.01-$150 700 Brand2 R
$150.01-$200 800 Brand2 R
$200+ 900 Brand2 R
;
run;
Sorry, only meant the Have dataset as a datastep. Anyways, this goes someway to what you want, not sure how you get the customer_count, as it doesn't match the test data you provide, maybe clarify.
data have; infile datalines; input CUSTOMER_ID : $1. Channel_R : $15. Channel_O : $15. Channel_W : $15. Brand_Brand1 : $15. Brand_Brand2 : $15.; datalines; 1 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+ 2 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75 3 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+ 4 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75 5 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+ 6 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75 7 $25.01-$50 $0.01-$25 $100.01-$150 $150.01-$200 $200+ 8 <$0 $25.01-$50 $25.01-$50 $100.01-$150 $50.01-$75 9 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+ 10 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75 ; run; data inter (keep=customer_id amount channel brand); set have; length amount channel brand $200; array channels{3} channel_r--channel_w; array brands{2} brand_brand1 brand_brand2; do j=1 to 3; amount=channels{j}; channel=vname(channels{j}); do i=1 to 2; brand=vname(brands{i}); output; end; end; run; proc sql; create table WANT as select AMOUNT, count(distinct CUSTOMER_ID) as CUSTOMER_COUNT, CHANNEL, BRAND from INTER group by AMOUNT, CHANNEL, BRAND; quit;
Hi @RW9 Thanks for the effort. I think this works. The customer count in output data shown was just for reference and not exact numbers that we would get. What I am trying to achieve is to get the count of customers unique across an amount-channel-brand group. and I think your solutions addresses that.
Regards,
Please describe what (and how as a rule) to get the "amount" for:
Channel_R and Brand_Brand1
Channel_R and Brand_Brand2
Channel_O and Brand_Brand1
Channel_O and Brand_Brand2
Channel_W and Brand_Brand1
Channel_W and Brand_Brand2
from the data in your first row.
I do NOT see any obvious way to accomplish that.
I thought it might be possibly to generate a count data set first to count customers and transpose that set but the issue is the intersection of "channel" and "brand' to generate a single amount is not intuitively obvious to me.
Also, are the values shown actually text or the display of a numeric value using a format?
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.