I have the below dataset at a customer level-
data have;
infile datalines;
input CUSTOMER_ID : $1. Channel_R : $15. Channel_O : $15. Channel_W : $15. Brand_Brand1 : $15. Brand_Brand2 : $15.;
datalines;
1 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
2 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
3 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
4 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
5 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
6 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
7 $25.01-$50 $0.01-$25 $100.01-$150 $150.01-$200 $200+
8 <$0 $25.01-$50 $25.01-$50 $100.01-$150 $50.01-$75
9 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
10 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
;
run;
I need the output as follows-
data want;
infile datalines;
input Amount : $15. Customer_Count : $5. Brand : $15. Channel : $10.;
datalines;
Missing 100 Brand1 W
<$0 200 Brand1 W
$.01-$25 300 Brand1 W
$25.01-$50 400 Brand1 W
$50.01-$75 500 Brand1 W
$75.01-$100 600 Brand1 W
$100.01-$150 700 Brand1 W
$150.01-$200 800 Brand1 W
$200+ 900 Brand1 W
Missing 100 Brand2 R
<$0 200 Brand2 R
$0.01-$25 300 Brand2 R
$25.01-$50 400 Brand2 R
$50.01-$75 500 Brand2 R
$75.01-$100 600 Brand2 R
$100.01-$150 700 Brand2 R
$150.01-$200 800 Brand2 R
$200+ 900 Brand2 R
;
run;
I tried using PROC Transpose but since the dataset is very large, it takes a huge amount of time..
Sorry, only meant the Have dataset as a datastep. Anyways, this goes someway to what you want, not sure how you get the customer_count, as it doesn't match the test data you provide, maybe clarify.
data have;
infile datalines;
input CUSTOMER_ID : $1. Channel_R : $15. Channel_O : $15. Channel_W : $15. Brand_Brand1 : $15. Brand_Brand2 : $15.;
datalines;
1 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
2 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
3 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
4 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
5 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
6 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
7 $25.01-$50 $0.01-$25 $100.01-$150 $150.01-$200 $200+
8 <$0 $25.01-$50 $25.01-$50 $100.01-$150 $50.01-$75
9 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
10 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
;
run;
data inter (keep=customer_id amount channel brand);
set have;
length amount channel brand $200;
array channels{3} channel_r--channel_w;
array brands{2} brand_brand1 brand_brand2;
do j=1 to 3;
amount=channels{j};
channel=vname(channels{j});
do i=1 to 2;
brand=vname(brands{i});
output;
end;
end;
run;
proc sql;
create table WANT as
select AMOUNT,
count(distinct CUSTOMER_ID) as CUSTOMER_COUNT,
CHANNEL,
BRAND
from INTER
group by AMOUNT,
CHANNEL,
BRAND;
quit;
Post test data in the form of a datastep!!
As such I am not typing that in. If you have a large amount of data, it will take a large amount of time, irrespective of what code you use. Although from your output want data I don't see how transpose could get your there unless you have multiple procs. A datastep, which loops of an array of the variables and outputs a row for each item, then left join a count onto that, may be a bit quicker.
@RW9: Sorry. Below my restructured question. It's a bit challenging for me hence I was looking for a proper code:
data have;
infile datalines;
input CUSTOMER_ID : $1. Channel_R : $15. Channel_O : $15. Channel_W : $15. Brand_Brand1 : $15. Brand_Brand2 : $15.;
datalines;
1 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
2 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
3 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
4 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
5 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
6 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
7 $25.01-$50 $0.01-$25 $100.01-$150 $150.01-$200 $200+
8 <$0 $25.01-$50 $25.01-$50 $100.01-$150 $50.01-$75
9 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
10 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
;
run;
data want;
infile datalines;
input Amount : $15. Customer_Count : $5. Brand : $15. Channel : $10.;
datalines;
Missing 100 Brand1 W
<$0 200 Brand1 W
$.01-$25 300 Brand1 W
$25.01-$50 400 Brand1 W
$50.01-$75 500 Brand1 W
$75.01-$100 600 Brand1 W
$100.01-$150 700 Brand1 W
$150.01-$200 800 Brand1 W
$200+ 900 Brand1 W
Missing 100 Brand2 R
<$0 200 Brand2 R
$0.01-$25 300 Brand2 R
$25.01-$50 400 Brand2 R
$50.01-$75 500 Brand2 R
$75.01-$100 600 Brand2 R
$100.01-$150 700 Brand2 R
$150.01-$200 800 Brand2 R
$200+ 900 Brand2 R
;
run;
Sorry, only meant the Have dataset as a datastep. Anyways, this goes someway to what you want, not sure how you get the customer_count, as it doesn't match the test data you provide, maybe clarify.
data have;
infile datalines;
input CUSTOMER_ID : $1. Channel_R : $15. Channel_O : $15. Channel_W : $15. Brand_Brand1 : $15. Brand_Brand2 : $15.;
datalines;
1 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
2 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
3 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
4 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
5 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
6 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
7 $25.01-$50 $0.01-$25 $100.01-$150 $150.01-$200 $200+
8 <$0 $25.01-$50 $25.01-$50 $100.01-$150 $50.01-$75
9 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
10 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
;
run;
data inter (keep=customer_id amount channel brand);
set have;
length amount channel brand $200;
array channels{3} channel_r--channel_w;
array brands{2} brand_brand1 brand_brand2;
do j=1 to 3;
amount=channels{j};
channel=vname(channels{j});
do i=1 to 2;
brand=vname(brands{i});
output;
end;
end;
run;
proc sql;
create table WANT as
select AMOUNT,
count(distinct CUSTOMER_ID) as CUSTOMER_COUNT,
CHANNEL,
BRAND
from INTER
group by AMOUNT,
CHANNEL,
BRAND;
quit;
Hi @RW9 Thanks for the effort. I think this works. The customer count in output data shown was just for reference and not exact numbers that we would get. What I am trying to achieve is to get the count of customers unique across an amount-channel-brand group. and I think your solutions addresses that.
Regards,
Please describe what (and how as a rule) to get the "amount" for:
Channel_R and Brand_Brand1
Channel_R and Brand_Brand2
Channel_O and Brand_Brand1
Channel_O and Brand_Brand2
Channel_W and Brand_Brand1
Channel_W and Brand_Brand2
from the data in your first row.
I do NOT see any obvious way to accomplish that.
I thought it might be possibly to generate a count data set first to count customers and transpose that set but the issue is the intersection of "channel" and "brand' to generate a single amount is not intuitively obvious to me.
Also, are the values shown actually text or the display of a numeric value using a format?
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.