Hi, I have a data which looks like this:
Year Month Office Sales_type Sales
2018 Jan Dallas A 10
2018 Jan Dallas B 13
2018 Jan Dallas C 15
2018 Jan Dallas D 20
2018 Jan NY A 5
2018 Jan NY B 9
2018 Jan NY C 7
2018 Jan NY D 17
2018 Jan DC A 15
2018 Jan DC B 19
2018 Jan DC C 17
2018 Jan DC D 19
2018 Feb Dallas A 11
2018 Feb Dallas B 14
2018 Feb Dallas C 16
2018 Feb Dallas D 21
2018 Feb NY A 6
2018 Feb NY B 10
2018 Feb NY C 8
2018 Feb NY D 18
2018 Feb DC A 16
2018 Feb DC B 20
2018 Feb DC C 18
2018 Feb DC D 20
If the office is Dallas and NY and the sales_type is A and B then it belongs to Tim's department
If the office is Dallas and the sales_type is C and D then it belongs to Sam's department
If the office is NY and DC and the sales_type is C and D then it belongs to Henry's department
If the office is DC and the sales_type is A and B then it belongs to Rick's department
Now I want the data to be summarized by year, month and department which would look like this:
Year Month Dept Sales
2018 Jan Tim 37
2018 Jan Sam 35
2018 Jan Henry 60
2018 Jan Rick 34
2018 Feb Tim 41
2018 Feb Sam 37
2018 Feb Henry 64
2018 Feb Rick 36
2019 Jan Tim ----
2019 Jan Sam ----
----- ------ ---- -----
Here is what I have tried:
PROC SQL;
   CREATE TABLE WORK.Sales_Summary AS 
   SELECT DISTINCT Month,
		(CASE WHEN Sales_type IN ('A','B') AND Office IN ('Dallas','NY') THEN SUM(Sales) END) as Tim,
		(CASE WHEN Sales_type IN ('C','D') AND Office IN ('Dallas') THEN SUM(Sales) END) as Sam,
                (CASE WHEN Sales_type IN ('C','D') AND Office IN ('NY','DC') THEN SUM(Sales) END) as Henry,	
                (CASE WHEN Sales_type IN ('A','B') AND Office IN ('DC') THEN SUM(Sales) END) as Rick,	
      FROM WORK.Sales_Data t1
             GROUP BY Year, Month;
QUIT;After I get this I was hoping I could transpose the columns Tim, Sam, Henry and Rick. But the problem is I couldn't even get to summarize the data in those columns correctly.
Could you please suggest me the way to get this data summarized in this manner. I use SAS EG 7.15 for making queries. Thank you in Advance!
Hi @Lekhnath
I have updated the code to take into account this new requirement.
In my opinion, the easiest way to handle "overlapping" departments is to use a specific format .
In the below code, I have defined the format Dept. as a multilabel one (for example, 1 can be formatted either as "Tim" or "Chuck").
Then, you can invoke this format in the proc means and specify the option "MLF" to get summary statistics for each format category.
Best,
data have;
	infile datalines dlm=" ";
	input Year Month $ Office $ Sales_type $ Sales;
	datalines;
2018     Jan           Dallas      A                      10
2018     Jan           Dallas      B                      13
2018     Jan           Dallas      C                      15
2018     Jan           Dallas      D                      20
2018     Jan           NY           A                       5
2018     Jan           NY           B                       9
2018     Jan           NY           C                       7
2018     Jan           NY           D                      17
2018     Jan           DC           A                      15
2018     Jan           DC           B                      19
2018     Jan           DC           C                      17
2018     Jan           DC           D                      19
2018     Feb           Dallas      A                      11
2018     Feb           Dallas      B                      14
2018     Feb           Dallas      C                      16
2018     Feb           Dallas      D                      21
2018     Feb           NY           A                       6
2018     Feb           NY           B                       10
2018     Feb           NY           C                       8
2018     Feb           NY           D                      18
2018     Feb           DC           A                      16
2018     Feb           DC           B                      20
2018     Feb           DC           C                      18
2018     Feb           DC           D                      20
;
run;
data have2;
	set have;
	if 		Office in ("Dallas","NY") and Sales_type in ("A","B") then Dept=1;
	else if Office = "Dallas"		  and Sales_type in ("C","D") then Dept=2;
	else if Office in ("NY", "DC") 	  and Sales_type in ("C","D") then Dept=3;
	else if Office="DC"			      and Sales_type in ("A","B") then Dept=4;
run;
proc sort data=have2 (drop= Office Sales_type);
	by Year Month Dept;
run;
proc format fmtlib;
	value Dept (multilabel)
		1 = "Tim"
		2 = "Sam"
		3 = "Henry"
		4 = "Rick"
		1,4 = "Chuck";
run;
proc means data=have2 sum maxdec=0;
	var Sales;
	class Year Month Dept / mlf;
	ways 3; 
	output out=want (drop=_type_ _freq_) sum=Sales;
	format Dept Dept.;
run;
Hi @Lekhnath
You can try the following code, using a proc means to get the summary statistics (sum of sales).
Ideally, if you want to display your table in a chronological order (years and months), I recommend that you define a format for months (e.g. 1 = "Jan", etc.)
Best,
data have;
	infile datalines dlm=" ";
	input Year Month $ Office $ Sales_type $ Sales;
	datalines;
2018     Jan           Dallas      A                      10
2018     Jan           Dallas      B                      13
2018     Jan           Dallas      C                      15
2018     Jan           Dallas      D                      20
2018     Jan           NY           A                       5
2018     Jan           NY           B                       9
2018     Jan           NY           C                       7
2018     Jan           NY           D                      17
2018     Jan           DC           A                      15
2018     Jan           DC           B                      19
2018     Jan           DC           C                      17
2018     Jan           DC           D                      19
2018     Feb           Dallas      A                      11
2018     Feb           Dallas      B                      14
2018     Feb           Dallas      C                      16
2018     Feb           Dallas      D                      21
2018     Feb           NY           A                       6
2018     Feb           NY           B                       10
2018     Feb           NY           C                       8
2018     Feb           NY           D                      18
2018     Feb           DC           A                      16
2018     Feb           DC           B                      20
2018     Feb           DC           C                      18
2018     Feb           DC           D                      20
;
run;
data have2;
	set have;
	length dept $10.;
	if 		Office in ("Dallas","NY") and Sales_type in ("A","B") then Dept="Tim";
	else if Office = "Dallas"		  and Sales_type in ("C","D") then Dept="Sam";
	else if Office in ("NY", "DC") 	  and Sales_type in ("C","D") then Dept="Henry";
	else if Office="DC"			      and Sales_type in ("A","B") then Dept="Rick";
run;
proc sort data=have2 (drop= Office Sales_type);
	by Year Month Dept;
run;
proc means data=have2 sum maxdec=0;
	var Sales;
	class Year Month Dept;
	ways 3; 
	output out=want (drop=_type_ _freq_) sum=Sales;
run;
An alternate approach to consider with multiple nested If/then/else type coding is the use of the SELECT statement.
Consider:
data have;
	infile datalines dlm=" ";
	input Year Month $ Office $ Sales_type $ Sales;
	datalines;
2018     Jan           Dallas      A                      10
2018     Jan           Dallas      B                      13
2018     Jan           Dallas      C                      15
2018     Jan           Dallas      D                      20
2018     Jan           NY           A                       5
2018     Jan           NY           B                       9
2018     Jan           NY           C                       7
2018     Jan           NY           D                      17
2018     Jan           DC           A                      15
2018     Jan           DC           B                      19
2018     Jan           DC           C                      17
2018     Jan           DC           D                      19
2018     Feb           Dallas      A                      11
2018     Feb           Dallas      B                      14
2018     Feb           Dallas      C                      16
2018     Feb           Dallas      D                      21
2018     Feb           NY           A                       6
2018     Feb           NY           B                       10
2018     Feb           NY           C                       8
2018     Feb           NY           D                      18
2018     Feb           DC           A                      16
2018     Feb           DC           B                      20
2018     Feb           DC           C                      18
2018     Feb           DC           D                      20
;
run;
data work.have2;
	set have;
	length dept $10.;
	if 		Office in ("Dallas","NY") and Sales_type in ("A","B") then Dept="Tim";
	else if Office = "Dallas"		  and Sales_type in ("C","D") then Dept="Sam";
	else if Office in ("NY", "DC") 	  and Sales_type in ("C","D") then Dept="Henry";
	else if Office="DC"			      and Sales_type in ("A","B") then Dept="Rick";
run;
data work.have3;
	set have;
	length dept $10.;
   select (office);
      when ('Dallas') select (sales_type) ;
                        when ('A','B')  Dept='Tim';
                        when ('C','D')  Dept='Sam';
                        otherwise put "WARNING: Unexpected Sales type of " sales_type "for " office=;
                      end;
      when ('NY')     select (sales_type) ;
                        when ('A','B')  Dept='Tim';
                        when ('C','D')  Dept='Henry';
                        otherwise put "WARNING: Unexpected Sales type of " sales_type "for " office=;
                      end;
      when ('DC')     select (sales_type) ;
                        when ('A','B')  Dept='Rick';
                        when ('C','D')  Dept='Henry';
                        otherwise put "WARNING: Unexpected Sales type of " sales_type "for " office=;
                     end;
      otherwise put "WARNING: Unexpected office value of " office;
   end;
run;
This may take a few more lines of code but it is a bit easier to determine what is done for each office and each sales_type code within an office. The OTHERWISE statement in the select is what would get executed if nothing matches any of the WHEN values. In this case messages in the log of unexpected values. If there is any likelihood of the assignments changing frequently this may be easier to maintain.
Better might be a data set with the values of Office, Sales_type and dept and do an sql join to bring in the dept. This will be a much better way if you have LOTS of offices and Sales_types.
data work.departments;
   input office $ sales_type $ dept $;
datalines;
Dallas A Tim
Dallas B Tim
Dallas C Sam
Dallas D Sam
NY     A Tim
NY     B Tim
NY     C Henry
NY     D Henry
DC     A Rick
DC     B Rick
DC     C Henry
DC     D Henry
run;
proc sql;
   create table work.have4 as
   select a.*, b.dept
   from work.have as a
        left join
        work.departments as b
        on a.office=b.office
        and a.sales_type=b.sales_type
   ;
quit;
					
				
			
			
				
			
			
			
			
			
			
			
		Hi @ballardw
Thank you for your reply. I forgot one very important piece in the question, sorry about that. That is: There is another department "Chuck" which satisfies same criteria of Tim's department plus Rick's department. So, basically the result of Chuck's sales should be sum of Tim's and Rick's. But the data should still be summarized for all Tim, Sam, Henry, Rick and Chuck. Thank you.
Thank you for your reply. I forgot one very important piece in the question, sorry about that. That is: There is another department "Chuck" which satisfies same criteria of Tim's department plus Rick's department. So, basically the result of Chuck's sales should be sum of Tim's and Rick's. But the data should still be summarized for all Tim, Sam, Henry, Rick and Chuck.
Also, I would like the data as a data table rather than a report. Thank you.
Hi @Lekhnath
I have updated the code to take into account this new requirement.
In my opinion, the easiest way to handle "overlapping" departments is to use a specific format .
In the below code, I have defined the format Dept. as a multilabel one (for example, 1 can be formatted either as "Tim" or "Chuck").
Then, you can invoke this format in the proc means and specify the option "MLF" to get summary statistics for each format category.
Best,
data have;
	infile datalines dlm=" ";
	input Year Month $ Office $ Sales_type $ Sales;
	datalines;
2018     Jan           Dallas      A                      10
2018     Jan           Dallas      B                      13
2018     Jan           Dallas      C                      15
2018     Jan           Dallas      D                      20
2018     Jan           NY           A                       5
2018     Jan           NY           B                       9
2018     Jan           NY           C                       7
2018     Jan           NY           D                      17
2018     Jan           DC           A                      15
2018     Jan           DC           B                      19
2018     Jan           DC           C                      17
2018     Jan           DC           D                      19
2018     Feb           Dallas      A                      11
2018     Feb           Dallas      B                      14
2018     Feb           Dallas      C                      16
2018     Feb           Dallas      D                      21
2018     Feb           NY           A                       6
2018     Feb           NY           B                       10
2018     Feb           NY           C                       8
2018     Feb           NY           D                      18
2018     Feb           DC           A                      16
2018     Feb           DC           B                      20
2018     Feb           DC           C                      18
2018     Feb           DC           D                      20
;
run;
data have2;
	set have;
	if 		Office in ("Dallas","NY") and Sales_type in ("A","B") then Dept=1;
	else if Office = "Dallas"		  and Sales_type in ("C","D") then Dept=2;
	else if Office in ("NY", "DC") 	  and Sales_type in ("C","D") then Dept=3;
	else if Office="DC"			      and Sales_type in ("A","B") then Dept=4;
run;
proc sort data=have2 (drop= Office Sales_type);
	by Year Month Dept;
run;
proc format fmtlib;
	value Dept (multilabel)
		1 = "Tim"
		2 = "Sam"
		3 = "Henry"
		4 = "Rick"
		1,4 = "Chuck";
run;
proc means data=have2 sum maxdec=0;
	var Sales;
	class Year Month Dept / mlf;
	ways 3; 
	output out=want (drop=_type_ _freq_) sum=Sales;
	format Dept Dept.;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
