Below is the dataset I have
What I have | ||||||
Customer ID | Ord_Month | Channel | Brand 1 Flag | Brand 2 Flag | Transaction Flag | Amount |
1 | 2016-02 | Web | 1 | 0 | 1 | 100 |
1 | 2016-02 | Retail | 0 | 1 | 1 | 200 |
2 | 2016-03 | Web | 1 | 1 | 1 | 50 |
2 | 2016-04 | Others | 0 | 1 | 1 | 90 |
2 | 2016-04 | Others | 0 | 1 | 1 | 300 |
I want to summarize the amount, transactions and customers by brand and channel. Transaction is valid if transaction flag is 1. Below is the output I expect
What I want | |||||
Fiscal Period | Total Amount | Total Customers | Total Transactions | Brand | Channel |
2016-02 | 100 | 1 | 1 | Brand 1 | Web |
2016-02 | 200 | 1 | 1 | Brand 2 | Retail |
2016-03 | 50 | 1 | 1 | Both | Web |
2016-04 | 390 | 1 | 2 | Brand 2 | Others |
For each distinct combination of Brand and Channel for a given fiscal period I want to calculate the total customers (count of customer ID), total transactions (count of transaction flag) and total amount (sum of amount).
Please help.
This would be so much easier if your data was modelled more normalized. So my first step would be to do create a new variable Brand. After that a simple SQL does the trick (as would proc means, tabulate, report etc):
data have;
input customer_id ord_month $ channel $ brand_1_flag brand_2_flag transaction_flag amount;
drop brand_1_flag brand_2_flag;
length brand $8;
if transaction_flag = 1;
if brand_1_flag = 1 and brand_2_flag = 1 then do;
brand = 'Both';
output;
end;
else do;
if brand_1_flag = 1 then do;
brand = 'Brand 1';
output;
end;
if brand_2_flag = 1 then do;
brand = 'Brand 2';
output;
end;
end;
cards;
1 2016-02 Web 1 0 1 100
1 2016-02 Retail 0 1 1 200
2 2016-03 Web 1 1 1 50
2 2016-04 Others 0 1 1 90
2 2016-04 Others 0 1 1 300
;
proc sql;
select ord_month, sum(amount) as total_amount, count(transaction_flag) as total_transactions, brand, channel
from have
group by ord_month, channel, brand;
quit;
The subsetting IF can also be delayed to a WHERE statement in the SQL.
Hope this helps,
-- Jan
This would be so much easier if your data was modelled more normalized. So my first step would be to do create a new variable Brand. After that a simple SQL does the trick (as would proc means, tabulate, report etc):
data have;
input customer_id ord_month $ channel $ brand_1_flag brand_2_flag transaction_flag amount;
drop brand_1_flag brand_2_flag;
length brand $8;
if transaction_flag = 1;
if brand_1_flag = 1 and brand_2_flag = 1 then do;
brand = 'Both';
output;
end;
else do;
if brand_1_flag = 1 then do;
brand = 'Brand 1';
output;
end;
if brand_2_flag = 1 then do;
brand = 'Brand 2';
output;
end;
end;
cards;
1 2016-02 Web 1 0 1 100
1 2016-02 Retail 0 1 1 200
2 2016-03 Web 1 1 1 50
2 2016-04 Others 0 1 1 90
2 2016-04 Others 0 1 1 300
;
proc sql;
select ord_month, sum(amount) as total_amount, count(transaction_flag) as total_transactions, brand, channel
from have
group by ord_month, channel, brand;
quit;
The subsetting IF can also be delayed to a WHERE statement in the SQL.
Hope this helps,
-- Jan
Thanks a lot. Works like a charm.
It is sooo much simpler to do this in PROC MEANS or PROC SUMMARY, the idea of writing your own data step code and then PROC SQL to get the statistics seems inefficient.
This is trivial to do in PROC MEANS or PROC SUMMARY. Give it a try.
I dare you with this denormalized model. Remember there are brand_1 and brand_2 variables that need to be combined. I look forward to your mean MEANS. And don't make it non-trivial!
Regards,
-- Jan.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.