- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I have a data which looks like this:
Year Month Office Sales_type Sales
2018 Jan Dallas A 10
2018 Jan Dallas B 13
2018 Jan Dallas C 15
2018 Jan Dallas D 20
2018 Jan NY A 5
2018 Jan NY B 9
2018 Jan NY C 7
2018 Jan NY D 17
2018 Jan DC A 15
2018 Jan DC B 19
2018 Jan DC C 17
2018 Jan DC D 19
2018 Feb Dallas A 11
2018 Feb Dallas B 14
2018 Feb Dallas C 16
2018 Feb Dallas D 21
2018 Feb NY A 6
2018 Feb NY B 10
2018 Feb NY C 8
2018 Feb NY D 18
2018 Feb DC A 16
2018 Feb DC B 20
2018 Feb DC C 18
2018 Feb DC D 20
If the office is Dallas and NY and the sales_type is A and B then it belongs to Tim's department
If the office is Dallas and the sales_type is C and D then it belongs to Sam's department
If the office is NY and DC and the sales_type is C and D then it belongs to Henry's department
If the office is DC and the sales_type is A and B then it belongs to Rick's department
Now I want the data to be summarized by year, month and department which would look like this:
Year Month Dept Sales
2018 Jan Tim 37
2018 Jan Sam 35
2018 Jan Henry 60
2018 Jan Rick 34
2018 Feb Tim 41
2018 Feb Sam 37
2018 Feb Henry 64
2018 Feb Rick 36
2019 Jan Tim ----
2019 Jan Sam ----
----- ------ ---- -----
Here is what I have tried:
PROC SQL;
CREATE TABLE WORK.Sales_Summary AS
SELECT DISTINCT Month,
(CASE WHEN Sales_type IN ('A','B') AND Office IN ('Dallas','NY') THEN SUM(Sales) END) as Tim,
(CASE WHEN Sales_type IN ('C','D') AND Office IN ('Dallas') THEN SUM(Sales) END) as Sam,
(CASE WHEN Sales_type IN ('C','D') AND Office IN ('NY','DC') THEN SUM(Sales) END) as Henry,
(CASE WHEN Sales_type IN ('A','B') AND Office IN ('DC') THEN SUM(Sales) END) as Rick,
FROM WORK.Sales_Data t1
GROUP BY Year, Month;
QUIT;
After I get this I was hoping I could transpose the columns Tim, Sam, Henry and Rick. But the problem is I couldn't even get to summarize the data in those columns correctly.
Could you please suggest me the way to get this data summarized in this manner. I use SAS EG 7.15 for making queries. Thank you in Advance!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Lekhnath
I have updated the code to take into account this new requirement.
In my opinion, the easiest way to handle "overlapping" departments is to use a specific format .
In the below code, I have defined the format Dept. as a multilabel one (for example, 1 can be formatted either as "Tim" or "Chuck").
Then, you can invoke this format in the proc means and specify the option "MLF" to get summary statistics for each format category.
Best,
data have;
infile datalines dlm=" ";
input Year Month $ Office $ Sales_type $ Sales;
datalines;
2018 Jan Dallas A 10
2018 Jan Dallas B 13
2018 Jan Dallas C 15
2018 Jan Dallas D 20
2018 Jan NY A 5
2018 Jan NY B 9
2018 Jan NY C 7
2018 Jan NY D 17
2018 Jan DC A 15
2018 Jan DC B 19
2018 Jan DC C 17
2018 Jan DC D 19
2018 Feb Dallas A 11
2018 Feb Dallas B 14
2018 Feb Dallas C 16
2018 Feb Dallas D 21
2018 Feb NY A 6
2018 Feb NY B 10
2018 Feb NY C 8
2018 Feb NY D 18
2018 Feb DC A 16
2018 Feb DC B 20
2018 Feb DC C 18
2018 Feb DC D 20
;
run;
data have2;
set have;
if Office in ("Dallas","NY") and Sales_type in ("A","B") then Dept=1;
else if Office = "Dallas" and Sales_type in ("C","D") then Dept=2;
else if Office in ("NY", "DC") and Sales_type in ("C","D") then Dept=3;
else if Office="DC" and Sales_type in ("A","B") then Dept=4;
run;
proc sort data=have2 (drop= Office Sales_type);
by Year Month Dept;
run;
proc format fmtlib;
value Dept (multilabel)
1 = "Tim"
2 = "Sam"
3 = "Henry"
4 = "Rick"
1,4 = "Chuck";
run;
proc means data=have2 sum maxdec=0;
var Sales;
class Year Month Dept / mlf;
ways 3;
output out=want (drop=_type_ _freq_) sum=Sales;
format Dept Dept.;
run;
- Tags:
- multilabel format
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Lekhnath
You can try the following code, using a proc means to get the summary statistics (sum of sales).
Ideally, if you want to display your table in a chronological order (years and months), I recommend that you define a format for months (e.g. 1 = "Jan", etc.)
Best,
data have;
infile datalines dlm=" ";
input Year Month $ Office $ Sales_type $ Sales;
datalines;
2018 Jan Dallas A 10
2018 Jan Dallas B 13
2018 Jan Dallas C 15
2018 Jan Dallas D 20
2018 Jan NY A 5
2018 Jan NY B 9
2018 Jan NY C 7
2018 Jan NY D 17
2018 Jan DC A 15
2018 Jan DC B 19
2018 Jan DC C 17
2018 Jan DC D 19
2018 Feb Dallas A 11
2018 Feb Dallas B 14
2018 Feb Dallas C 16
2018 Feb Dallas D 21
2018 Feb NY A 6
2018 Feb NY B 10
2018 Feb NY C 8
2018 Feb NY D 18
2018 Feb DC A 16
2018 Feb DC B 20
2018 Feb DC C 18
2018 Feb DC D 20
;
run;
data have2;
set have;
length dept $10.;
if Office in ("Dallas","NY") and Sales_type in ("A","B") then Dept="Tim";
else if Office = "Dallas" and Sales_type in ("C","D") then Dept="Sam";
else if Office in ("NY", "DC") and Sales_type in ("C","D") then Dept="Henry";
else if Office="DC" and Sales_type in ("A","B") then Dept="Rick";
run;
proc sort data=have2 (drop= Office Sales_type);
by Year Month Dept;
run;
proc means data=have2 sum maxdec=0;
var Sales;
class Year Month Dept;
ways 3;
output out=want (drop=_type_ _freq_) sum=Sales;
run;
- Tags:
- if-then
- PROC MEANS
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
An alternate approach to consider with multiple nested If/then/else type coding is the use of the SELECT statement.
Consider:
data have; infile datalines dlm=" "; input Year Month $ Office $ Sales_type $ Sales; datalines; 2018 Jan Dallas A 10 2018 Jan Dallas B 13 2018 Jan Dallas C 15 2018 Jan Dallas D 20 2018 Jan NY A 5 2018 Jan NY B 9 2018 Jan NY C 7 2018 Jan NY D 17 2018 Jan DC A 15 2018 Jan DC B 19 2018 Jan DC C 17 2018 Jan DC D 19 2018 Feb Dallas A 11 2018 Feb Dallas B 14 2018 Feb Dallas C 16 2018 Feb Dallas D 21 2018 Feb NY A 6 2018 Feb NY B 10 2018 Feb NY C 8 2018 Feb NY D 18 2018 Feb DC A 16 2018 Feb DC B 20 2018 Feb DC C 18 2018 Feb DC D 20 ; run; data work.have2; set have; length dept $10.; if Office in ("Dallas","NY") and Sales_type in ("A","B") then Dept="Tim"; else if Office = "Dallas" and Sales_type in ("C","D") then Dept="Sam"; else if Office in ("NY", "DC") and Sales_type in ("C","D") then Dept="Henry"; else if Office="DC" and Sales_type in ("A","B") then Dept="Rick"; run; data work.have3; set have; length dept $10.; select (office); when ('Dallas') select (sales_type) ; when ('A','B') Dept='Tim'; when ('C','D') Dept='Sam'; otherwise put "WARNING: Unexpected Sales type of " sales_type "for " office=; end; when ('NY') select (sales_type) ; when ('A','B') Dept='Tim'; when ('C','D') Dept='Henry'; otherwise put "WARNING: Unexpected Sales type of " sales_type "for " office=; end; when ('DC') select (sales_type) ; when ('A','B') Dept='Rick'; when ('C','D') Dept='Henry'; otherwise put "WARNING: Unexpected Sales type of " sales_type "for " office=; end; otherwise put "WARNING: Unexpected office value of " office; end; run;
This may take a few more lines of code but it is a bit easier to determine what is done for each office and each sales_type code within an office. The OTHERWISE statement in the select is what would get executed if nothing matches any of the WHEN values. In this case messages in the log of unexpected values. If there is any likelihood of the assignments changing frequently this may be easier to maintain.
Better might be a data set with the values of Office, Sales_type and dept and do an sql join to bring in the dept. This will be a much better way if you have LOTS of offices and Sales_types.
data work.departments; input office $ sales_type $ dept $; datalines; Dallas A Tim Dallas B Tim Dallas C Sam Dallas D Sam NY A Tim NY B Tim NY C Henry NY D Henry DC A Rick DC B Rick DC C Henry DC D Henry run; proc sql; create table work.have4 as select a.*, b.dept from work.have as a left join work.departments as b on a.office=b.office and a.sales_type=b.sales_type ; quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @ballardw
Thank you for your reply. I forgot one very important piece in the question, sorry about that. That is: There is another department "Chuck" which satisfies same criteria of Tim's department plus Rick's department. So, basically the result of Chuck's sales should be sum of Tim's and Rick's. But the data should still be summarized for all Tim, Sam, Henry, Rick and Chuck. Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your reply. I forgot one very important piece in the question, sorry about that. That is: There is another department "Chuck" which satisfies same criteria of Tim's department plus Rick's department. So, basically the result of Chuck's sales should be sum of Tim's and Rick's. But the data should still be summarized for all Tim, Sam, Henry, Rick and Chuck.
Also, I would like the data as a data table rather than a report. Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Lekhnath
I have updated the code to take into account this new requirement.
In my opinion, the easiest way to handle "overlapping" departments is to use a specific format .
In the below code, I have defined the format Dept. as a multilabel one (for example, 1 can be formatted either as "Tim" or "Chuck").
Then, you can invoke this format in the proc means and specify the option "MLF" to get summary statistics for each format category.
Best,
data have;
infile datalines dlm=" ";
input Year Month $ Office $ Sales_type $ Sales;
datalines;
2018 Jan Dallas A 10
2018 Jan Dallas B 13
2018 Jan Dallas C 15
2018 Jan Dallas D 20
2018 Jan NY A 5
2018 Jan NY B 9
2018 Jan NY C 7
2018 Jan NY D 17
2018 Jan DC A 15
2018 Jan DC B 19
2018 Jan DC C 17
2018 Jan DC D 19
2018 Feb Dallas A 11
2018 Feb Dallas B 14
2018 Feb Dallas C 16
2018 Feb Dallas D 21
2018 Feb NY A 6
2018 Feb NY B 10
2018 Feb NY C 8
2018 Feb NY D 18
2018 Feb DC A 16
2018 Feb DC B 20
2018 Feb DC C 18
2018 Feb DC D 20
;
run;
data have2;
set have;
if Office in ("Dallas","NY") and Sales_type in ("A","B") then Dept=1;
else if Office = "Dallas" and Sales_type in ("C","D") then Dept=2;
else if Office in ("NY", "DC") and Sales_type in ("C","D") then Dept=3;
else if Office="DC" and Sales_type in ("A","B") then Dept=4;
run;
proc sort data=have2 (drop= Office Sales_type);
by Year Month Dept;
run;
proc format fmtlib;
value Dept (multilabel)
1 = "Tim"
2 = "Sam"
3 = "Henry"
4 = "Rick"
1,4 = "Chuck";
run;
proc means data=have2 sum maxdec=0;
var Sales;
class Year Month Dept / mlf;
ways 3;
output out=want (drop=_type_ _freq_) sum=Sales;
format Dept Dept.;
run;
- Tags:
- multilabel format