Hi all,
I’m having trouble figuring this out. Any help is greatly appreciated
I have a data set which contains customer sales data by store and online channels. I need to take this data set and calculate:
Here is the table I start with:
CHANNEL_UNIT | CUSTOMER_ID | prod_type | prod_sales |
1 | 17131 | STORE_HOME | 0 |
1 | 17131 | STORE_CORE_APP_M | 526.8 |
1 | 17131 | STORE_CHILDREN | 0 |
1 | 17131 | STORE_CORE_APP_W | 0 |
3 | 17131 | ONLINE_HOME | 0 |
3 | 17131 | ONLINE_CHILDREN | 196.99 |
3 | 17131 | ONLINE_CORE_APP_M | 119.96 |
3 | 17131 | ONLINE_CORE_APP_W | 0 |
1 | 4687704 | STORE_HOME | 67.5 |
1 | 4687704 | STORE_CORE_APP_M | 0 |
1 | 4687704 | STORE_CHILDREN | 1518 |
1 | 4687704 | STORE_CORE_APP_W | 0 |
3 | 4687704 | ONLINE_HOME | 0 |
3 | 4687704 | ONLINE_CHILDREN | 120 |
3 | 4687704 | ONLINE_CORE_APP_M | 223 |
3 | 4687704 | ONLINE_CORE_APP_W | 0 |
This is the output table I'm trying to achieve:
CUSTOMER_ID | Total_Sales | Store_Sales | Online_Sales | Percent_Store_Sales | Percent_Online_Sales | TTL_HOME | TTL_CORE_APP_M | TTL_CORE_APP_W | TTL_STORE_CHILDREN | Per_HOME | Per_CORE_APP_M | Per_CORE_APP_W | Per_STORE_CHILDREN |
17131 | $843.75 | $526.80 | $316.95 | 62.4% | 60.2% | $0.00 | $646.76 | $0.00 | $196.99 | 0.0% | 76.7% | 0.0% | 23.3% |
4687704 | $1,928.50 | $1,585.50 | $343.00 | 82.2% | 21.6% | $67.50 | $223.00 | $0.00 | $1,638.00 | 3.5% | 11.6% | 0.0% | 84.9% |
Take a look at PROC TABULATE or PROC REPORT.
Hi,
I need an actual table, not a report output.
Proc tabulate does generate an OUTPUT dataset that can be manipulated into that format.
Another option is combining proc freqs or proc sqls. Because you're combining at different levels, its probably easiest to do multiple tables and combine them for your final results unless you're super confident in your coding skills.
@RobertNYC wrote:
Hi,
I need an actual table, not a report output.
If you need a data set then the question becomes what will you do next with the result? In a large number of cases making data wide like that increases coding complexity.
Hi ballardw,
So, the reason way I have to restructure the data is this way is because I am building a logistic model and I need to have the input variables transformed appropriately. Thanks.
It is easy for SQL. data have; infile cards truncover expandtabs; input CHANNEL_UNIT CUSTOMER_ID prod_type : $40. prod_sales; flag=scan(prod_type,1,'_'); cards; 1 17131 STORE_HOME 0 1 17131 STORE_CORE_APP_M 526.8 1 17131 STORE_CHILDREN 0 1 17131 STORE_CORE_APP_W 0 3 17131 ONLINE_HOME 0 3 17131 ONLINE_CHILDREN 196.99 3 17131 ONLINE_CORE_APP_M 119.96 3 17131 ONLINE_CORE_APP_W 0 1 4687704 STORE_HOME 67.5 1 4687704 STORE_CORE_APP_M 0 1 4687704 STORE_CHILDREN 1518 1 4687704 STORE_CORE_APP_W 0 3 4687704 ONLINE_HOME 0 3 4687704 ONLINE_CHILDREN 120 3 4687704 ONLINE_CORE_APP_M 223 3 4687704 ONLINE_CORE_APP_W 0 ; run; proc sql; create table want as select customer_id, sum(prod_sales) as total_sales, (select sum(prod_sales) from have where customer_id=a.customer_id and flag='STORE') as store_sales, calculated total_sales/(select sum(prod_sales) from have) as percent_total, calculated store_sales/(select sum(prod_sales) from have where flag='STORE') as percent_store from have as a group by customer_id; quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.