Solved
Contributor
Posts: 69

# Summarize by Multiple Variables in Data Step

Below is the dataset I have

 What I have Customer ID Ord_Month Channel Brand 1 Flag Brand 2 Flag Transaction Flag Amount 1 2016-02 Web 1 0 1 100 1 2016-02 Retail 0 1 1 200 2 2016-03 Web 1 1 1 50 2 2016-04 Others 0 1 1 90 2 2016-04 Others 0 1 1 300

I want to summarize the amount, transactions and customers by brand and channel. Transaction is valid if transaction flag is 1. Below is the output I expect

 What I want Fiscal Period Total Amount Total Customers Total Transactions Brand Channel 2016-02 100 1 1 Brand 1 Web 2016-02 200 1 1 Brand 2 Retail 2016-03 50 1 1 Both Web 2016-04 390 1 2 Brand 2 Others

For each distinct combination of Brand and Channel for a given fiscal period I want to calculate the total customers (count of customer ID), total transactions (count of transaction flag) and total amount (sum of amount).

Accepted Solutions
Solution
‎12-29-2017 02:31 AM
Valued Guide
Posts: 533

## Re: Summarize by Multiple Variables in Data Step

This would be so much easier if your data was modelled more normalized. So my first step would be to do create a new variable Brand. After that a simple SQL does the trick (as would proc means, tabulate, report etc):

``````data have;
input customer_id ord_month \$ channel \$ brand_1_flag brand_2_flag transaction_flag amount;
drop brand_1_flag brand_2_flag;        length brand \$8;

if transaction_flag = 1;

if brand_1_flag = 1 and brand_2_flag = 1 then do;
brand = 'Both';
output;
end;
else do;
if brand_1_flag = 1 then do;
brand = 'Brand 1';
output;
end;

if brand_2_flag = 1 then do;
brand = 'Brand 2';
output;
end;
end;
cards;
1 2016-02 Web 1 0 1 100
1 2016-02 Retail 0 1 1 200
2 2016-03 Web 1 1 	1 50
2 2016-04 Others 0 1 1 90
2 2016-04 Others 0 1 1 300
;

proc sql;
select ord_month, sum(amount) as total_amount, count(transaction_flag) as total_transactions, brand, channel
from have
group by  ord_month, channel, brand;
quit;``````

The subsetting IF can also be delayed to a WHERE statement in the SQL.

Hope this helps,

-- Jan

All Replies
Solution
‎12-29-2017 02:31 AM
Valued Guide
Posts: 533

## Re: Summarize by Multiple Variables in Data Step

This would be so much easier if your data was modelled more normalized. So my first step would be to do create a new variable Brand. After that a simple SQL does the trick (as would proc means, tabulate, report etc):

``````data have;
input customer_id ord_month \$ channel \$ brand_1_flag brand_2_flag transaction_flag amount;
drop brand_1_flag brand_2_flag;        length brand \$8;

if transaction_flag = 1;

if brand_1_flag = 1 and brand_2_flag = 1 then do;
brand = 'Both';
output;
end;
else do;
if brand_1_flag = 1 then do;
brand = 'Brand 1';
output;
end;

if brand_2_flag = 1 then do;
brand = 'Brand 2';
output;
end;
end;
cards;
1 2016-02 Web 1 0 1 100
1 2016-02 Retail 0 1 1 200
2 2016-03 Web 1 1 	1 50
2 2016-04 Others 0 1 1 90
2 2016-04 Others 0 1 1 300
;

proc sql;
select ord_month, sum(amount) as total_amount, count(transaction_flag) as total_transactions, brand, channel
from have
group by  ord_month, channel, brand;
quit;``````

The subsetting IF can also be delayed to a WHERE statement in the SQL.

Hope this helps,

-- Jan

Contributor
Posts: 69

## Re: Summarize by Multiple Variables in Data Step

Thanks a lot. Works like a charm.

Posts: 3,018

## Re: Summarize by Multiple Variables in Data Step

It is sooo much simpler to do this in PROC MEANS or PROC SUMMARY, the idea of writing your own data step code and then PROC SQL to get the statistics seems inefficient.

--
Paige Miller
Posts: 3,018

## Re: Summarize by Multiple Variables in Data Step

This is trivial to do in PROC MEANS or PROC SUMMARY. Give it a try.

--
Paige Miller
Valued Guide
Posts: 533

## Re: Summarize by Multiple Variables in Data Step

I dare you with this denormalized model. Remember there are brand_1 and brand_2 variables that need to be combined. I look forward to your mean MEANS. And don't make it non-trivial!

Regards,

-- Jan.

☑ This topic is solved.