DATA Step, Macro, Functions and more

Summarize by Multiple Variables in Data Step

Accepted Solution Solved
Reply
Contributor
Posts: 69
Accepted Solution

Summarize by Multiple Variables in Data Step

Below is the dataset I have 

What I have             
Customer ID Ord_Month Channel Brand 1 Flag Brand 2 Flag Transaction Flag Amount
1 2016-02 Web 1 0 1 100
1 2016-02 Retail 0 1 1 200
2 2016-03 Web 1 1 1 50
2 2016-04 Others 0 1 1 90
2 2016-04 Others 0 1 1 300

 

I want to summarize the amount, transactions and customers by brand and channel. Transaction is valid if transaction flag is 1. Below is the output I expect

 

What I want          
Fiscal Period Total Amount Total Customers Total Transactions Brand  Channel
2016-02 100 1 1 Brand 1 Web
2016-02 200 1 1 Brand 2 Retail
2016-03 50 1 1 Both Web
2016-04 390 1 2 Brand 2 Others

 

For each distinct combination of Brand and Channel for a given fiscal period I want to calculate the total customers (count of customer ID), total transactions (count of transaction flag) and total amount (sum of amount). 

 

Please help.


Accepted Solutions
Solution
‎12-29-2017 02:31 AM
Valued Guide
Posts: 533

Re: Summarize by Multiple Variables in Data Step

Posted in reply to sasmaverick

This would be so much easier if your data was modelled more normalized. So my first step would be to do create a new variable Brand. After that a simple SQL does the trick (as would proc means, tabulate, report etc):

 

 

data have;
	input customer_id ord_month $ channel $ brand_1_flag brand_2_flag transaction_flag amount;
	drop brand_1_flag brand_2_flag;
    length brand $8; if transaction_flag = 1; if brand_1_flag = 1 and brand_2_flag = 1 then do; brand = 'Both'; output; end; else do; if brand_1_flag = 1 then do; brand = 'Brand 1'; output; end; if brand_2_flag = 1 then do; brand = 'Brand 2'; output; end; end; cards; 1 2016-02 Web 1 0 1 100 1 2016-02 Retail 0 1 1 200 2 2016-03 Web 1 1 1 50 2 2016-04 Others 0 1 1 90 2 2016-04 Others 0 1 1 300 ; proc sql; select ord_month, sum(amount) as total_amount, count(transaction_flag) as total_transactions, brand, channel from have group by ord_month, channel, brand; quit;

The subsetting IF can also be delayed to a WHERE statement in the SQL.

 

Hope this helps,

-- Jan

 

View solution in original post


All Replies
Solution
‎12-29-2017 02:31 AM
Valued Guide
Posts: 533

Re: Summarize by Multiple Variables in Data Step

Posted in reply to sasmaverick

This would be so much easier if your data was modelled more normalized. So my first step would be to do create a new variable Brand. After that a simple SQL does the trick (as would proc means, tabulate, report etc):

 

 

data have;
	input customer_id ord_month $ channel $ brand_1_flag brand_2_flag transaction_flag amount;
	drop brand_1_flag brand_2_flag;
    length brand $8; if transaction_flag = 1; if brand_1_flag = 1 and brand_2_flag = 1 then do; brand = 'Both'; output; end; else do; if brand_1_flag = 1 then do; brand = 'Brand 1'; output; end; if brand_2_flag = 1 then do; brand = 'Brand 2'; output; end; end; cards; 1 2016-02 Web 1 0 1 100 1 2016-02 Retail 0 1 1 200 2 2016-03 Web 1 1 1 50 2 2016-04 Others 0 1 1 90 2 2016-04 Others 0 1 1 300 ; proc sql; select ord_month, sum(amount) as total_amount, count(transaction_flag) as total_transactions, brand, channel from have group by ord_month, channel, brand; quit;

The subsetting IF can also be delayed to a WHERE statement in the SQL.

 

Hope this helps,

-- Jan

 

Contributor
Posts: 69

Re: Summarize by Multiple Variables in Data Step

Posted in reply to jklaverstijn

Thanks a lot. Works like a charm.

Respected Advisor
Posts: 3,018

Re: Summarize by Multiple Variables in Data Step

Posted in reply to sasmaverick

It is sooo much simpler to do this in PROC MEANS or PROC SUMMARY, the idea of writing your own data step code and then PROC SQL to get the statistics seems inefficient.

--
Paige Miller
Respected Advisor
Posts: 3,018

Re: Summarize by Multiple Variables in Data Step

Posted in reply to sasmaverick

This is trivial to do in PROC MEANS or PROC SUMMARY. Give it a try.

--
Paige Miller
Valued Guide
Posts: 533

Re: Summarize by Multiple Variables in Data Step

Posted in reply to PaigeMiller

I dare you with this denormalized model. Remember there are brand_1 and brand_2 variables that need to be combined. I look forward to your mean MEANS. And don't make it non-trivial!

 

Regards,

-- Jan.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 324 views
  • 1 like
  • 3 in conversation