Solved: Re: How do I summarizing data with conditional statement and group the...

Lekhnath · Posted 12-02-2019 12:15 PM

Hi, I have a data which looks like this:

Year Month Office Sales_type Sales

2018 Jan Dallas A 10

2018 Jan Dallas B 13

2018 Jan Dallas C 15

2018 Jan Dallas D 20

2018 Jan NY A 5

2018 Jan NY B 9

2018 Jan NY C 7

2018 Jan NY D 17

2018 Jan DC A 15

2018 Jan DC B 19

2018 Jan DC C 17

2018 Jan DC D 19

2018 Feb Dallas A 11

2018 Feb Dallas B 14

2018 Feb Dallas C 16

2018 Feb Dallas D 21

2018 Feb NY A 6

2018 Feb NY B 10

2018 Feb NY C 8

2018 Feb NY D 18

2018 Feb DC A 16

2018 Feb DC B 20

2018 Feb DC C 18

2018 Feb DC D 20

If the office is Dallas and NY and the sales_type is A and B then it belongs to Tim's department

If the office is Dallas and the sales_type is C and D then it belongs to Sam's department

If the office is NY and DC and the sales_type is C and D then it belongs to Henry's department

If the office is DC and the sales_type is A and B then it belongs to Rick's department

Now I want the data to be summarized by year, month and department which would look like this:

Year Month Dept Sales

2018 Jan Tim 37

2018 Jan Sam 35

2018 Jan Henry 60

2018 Jan Rick 34

2018 Feb Tim 41

2018 Feb Sam 37

2018 Feb Henry 64

2018 Feb Rick 36

2019 Jan Tim ----

2019 Jan Sam ----

----- ------ ---- -----

Here is what I have tried:

PROC SQL;
   CREATE TABLE WORK.Sales_Summary AS 
   SELECT DISTINCT Month,
		(CASE WHEN Sales_type IN ('A','B') AND Office IN ('Dallas','NY') THEN SUM(Sales) END) as Tim,
		(CASE WHEN Sales_type IN ('C','D') AND Office IN ('Dallas') THEN SUM(Sales) END) as Sam,
                (CASE WHEN Sales_type IN ('C','D') AND Office IN ('NY','DC') THEN SUM(Sales) END) as Henry,	
                (CASE WHEN Sales_type IN ('A','B') AND Office IN ('DC') THEN SUM(Sales) END) as Rick,	
      FROM WORK.Sales_Data t1
             GROUP BY Year, Month;
QUIT;

After I get this I was hoping I could transpose the columns Tim, Sam, Henry and Rick. But the problem is I couldn't even get to summarize the data in those columns correctly.

Could you please suggest me the way to get this data summarized in this manner. I use SAS EG 7.15 for making queries. Thank you in Advance!

ed_sas_member · Posted 12-02-2019 03:22 PM

Hi @Lekhnath

I have updated the code to take into account this new requirement.

In my opinion, the easiest way to handle "overlapping" departments is to use a specific format .

In the below code, I have defined the format Dept. as a multilabel one (for example, 1 can be formatted either as "Tim" or "Chuck").

Then, you can invoke this format in the proc means and specify the option "MLF" to get summary statistics for each format category.

Best,

data have;
	infile datalines dlm=" ";
	input Year Month $ Office $ Sales_type $ Sales;
	datalines;
2018     Jan           Dallas      A                      10
2018     Jan           Dallas      B                      13
2018     Jan           Dallas      C                      15
2018     Jan           Dallas      D                      20
2018     Jan           NY           A                       5
2018     Jan           NY           B                       9
2018     Jan           NY           C                       7
2018     Jan           NY           D                      17
2018     Jan           DC           A                      15
2018     Jan           DC           B                      19
2018     Jan           DC           C                      17
2018     Jan           DC           D                      19
2018     Feb           Dallas      A                      11
2018     Feb           Dallas      B                      14
2018     Feb           Dallas      C                      16
2018     Feb           Dallas      D                      21
2018     Feb           NY           A                       6
2018     Feb           NY           B                       10
2018     Feb           NY           C                       8
2018     Feb           NY           D                      18
2018     Feb           DC           A                      16
2018     Feb           DC           B                      20
2018     Feb           DC           C                      18
2018     Feb           DC           D                      20
;
run;



data have2;
	set have;
	if 		Office in ("Dallas","NY") and Sales_type in ("A","B") then Dept=1;
	else if Office = "Dallas"		  and Sales_type in ("C","D") then Dept=2;
	else if Office in ("NY", "DC") 	  and Sales_type in ("C","D") then Dept=3;
	else if Office="DC"			      and Sales_type in ("A","B") then Dept=4;
run;

proc sort data=have2 (drop= Office Sales_type);
	by Year Month Dept;
run;

proc format fmtlib;
	value Dept (multilabel)
		1 = "Tim"
		2 = "Sam"
		3 = "Henry"
		4 = "Rick"
		1,4 = "Chuck";
run;

proc means data=have2 sum maxdec=0;
	var Sales;
	class Year Month Dept / mlf;
	ways 3; 
	output out=want (drop=_type_ _freq_) sum=Sales;
	format Dept Dept.;
run;

View solution in original post

ed_sas_member · Posted 12-02-2019 12:48 PM

Hi @Lekhnath

You can try the following code, using a proc means to get the summary statistics (sum of sales).

Ideally, if you want to display your table in a chronological order (years and months), I recommend that you define a format for months (e.g. 1 = "Jan", etc.)

Best,

data have;
	infile datalines dlm=" ";
	input Year Month $ Office $ Sales_type $ Sales;
	datalines;
2018     Jan           Dallas      A                      10
2018     Jan           Dallas      B                      13
2018     Jan           Dallas      C                      15
2018     Jan           Dallas      D                      20
2018     Jan           NY           A                       5
2018     Jan           NY           B                       9
2018     Jan           NY           C                       7
2018     Jan           NY           D                      17
2018     Jan           DC           A                      15
2018     Jan           DC           B                      19
2018     Jan           DC           C                      17
2018     Jan           DC           D                      19
2018     Feb           Dallas      A                      11
2018     Feb           Dallas      B                      14
2018     Feb           Dallas      C                      16
2018     Feb           Dallas      D                      21
2018     Feb           NY           A                       6
2018     Feb           NY           B                       10
2018     Feb           NY           C                       8
2018     Feb           NY           D                      18
2018     Feb           DC           A                      16
2018     Feb           DC           B                      20
2018     Feb           DC           C                      18
2018     Feb           DC           D                      20
;
run;



data have2;
	set have;
	length dept $10.;
	if 		Office in ("Dallas","NY") and Sales_type in ("A","B") then Dept="Tim";
	else if Office = "Dallas"		  and Sales_type in ("C","D") then Dept="Sam";
	else if Office in ("NY", "DC") 	  and Sales_type in ("C","D") then Dept="Henry";
	else if Office="DC"			      and Sales_type in ("A","B") then Dept="Rick";
run;

proc sort data=have2 (drop= Office Sales_type);
	by Year Month Dept;
run;

proc means data=have2 sum maxdec=0;
	var Sales;
	class Year Month Dept;
	ways 3; 
	output out=want (drop=_type_ _freq_) sum=Sales;
run;

ballardw · Posted 12-02-2019 01:41 PM

An alternate approach to consider with multiple nested If/then/else type coding is the use of the SELECT statement.

Consider:

data have;
	infile datalines dlm=" ";
	input Year Month $ Office $ Sales_type $ Sales;
	datalines;
2018     Jan           Dallas      A                      10
2018     Jan           Dallas      B                      13
2018     Jan           Dallas      C                      15
2018     Jan           Dallas      D                      20
2018     Jan           NY           A                       5
2018     Jan           NY           B                       9
2018     Jan           NY           C                       7
2018     Jan           NY           D                      17
2018     Jan           DC           A                      15
2018     Jan           DC           B                      19
2018     Jan           DC           C                      17
2018     Jan           DC           D                      19
2018     Feb           Dallas      A                      11
2018     Feb           Dallas      B                      14
2018     Feb           Dallas      C                      16
2018     Feb           Dallas      D                      21
2018     Feb           NY           A                       6
2018     Feb           NY           B                       10
2018     Feb           NY           C                       8
2018     Feb           NY           D                      18
2018     Feb           DC           A                      16
2018     Feb           DC           B                      20
2018     Feb           DC           C                      18
2018     Feb           DC           D                      20
;
run;

data work.have2;
	set have;
	length dept $10.;
	if 		Office in ("Dallas","NY") and Sales_type in ("A","B") then Dept="Tim";
	else if Office = "Dallas"		  and Sales_type in ("C","D") then Dept="Sam";
	else if Office in ("NY", "DC") 	  and Sales_type in ("C","D") then Dept="Henry";
	else if Office="DC"			      and Sales_type in ("A","B") then Dept="Rick";
run;

data work.have3;
	set have;
	length dept $10.;
   select (office);
      when ('Dallas') select (sales_type) ;
                        when ('A','B')  Dept='Tim';
                        when ('C','D')  Dept='Sam';
                        otherwise put "WARNING: Unexpected Sales type of " sales_type "for " office=;
                      end;
      when ('NY')     select (sales_type) ;
                        when ('A','B')  Dept='Tim';
                        when ('C','D')  Dept='Henry';
                        otherwise put "WARNING: Unexpected Sales type of " sales_type "for " office=;
                      end;

      when ('DC')     select (sales_type) ;
                        when ('A','B')  Dept='Rick';
                        when ('C','D')  Dept='Henry';
                        otherwise put "WARNING: Unexpected Sales type of " sales_type "for " office=;
                     end;

      otherwise put "WARNING: Unexpected office value of " office;
   end;
run;

This may take a few more lines of code but it is a bit easier to determine what is done for each office and each sales_type code within an office. The OTHERWISE statement in the select is what would get executed if nothing matches any of the WHEN values. In this case messages in the log of unexpected values. If there is any likelihood of the assignments changing frequently this may be easier to maintain.

Better might be a data set with the values of Office, Sales_type and dept and do an sql join to bring in the dept. This will be a much better way if you have LOTS of offices and Sales_types.

data work.departments;
   input office $ sales_type $ dept $;
datalines;
Dallas A Tim
Dallas B Tim
Dallas C Sam
Dallas D Sam
NY     A Tim
NY     B Tim
NY     C Henry
NY     D Henry
DC     A Rick
DC     B Rick
DC     C Henry
DC     D Henry
run;

proc sql;
   create table work.have4 as
   select a.*, b.dept
   from work.have as a
        left join
        work.departments as b
        on a.office=b.office
        and a.sales_type=b.sales_type
   ;
quit;

Lekhnath · Posted 12-02-2019 03:01 PM

Hi @ballardw

Thank you for your reply. I forgot one very important piece in the question, sorry about that. That is: There is another department "Chuck" which satisfies same criteria of Tim's department plus Rick's department. So, basically the result of Chuck's sales should be sum of Tim's and Rick's. But the data should still be summarized for all Tim, Sam, Henry, Rick and Chuck. Thank you.

Lekhnath · Posted 12-02-2019 03:00 PM

Hi @ed_sas_member

Thank you for your reply. I forgot one very important piece in the question, sorry about that. That is: There is another department "Chuck" which satisfies same criteria of Tim's department plus Rick's department. So, basically the result of Chuck's sales should be sum of Tim's and Rick's. But the data should still be summarized for all Tim, Sam, Henry, Rick and Chuck.

Also, I would like the data as a data table rather than a report. Thank you.

ed_sas_member · Posted 12-02-2019 03:22 PM

Hi @Lekhnath

I have updated the code to take into account this new requirement.

In my opinion, the easiest way to handle "overlapping" departments is to use a specific format .

In the below code, I have defined the format Dept. as a multilabel one (for example, 1 can be formatted either as "Tim" or "Chuck").

Then, you can invoke this format in the proc means and specify the option "MLF" to get summary statistics for each format category.

Best,

data have;
	infile datalines dlm=" ";
	input Year Month $ Office $ Sales_type $ Sales;
	datalines;
2018     Jan           Dallas      A                      10
2018     Jan           Dallas      B                      13
2018     Jan           Dallas      C                      15
2018     Jan           Dallas      D                      20
2018     Jan           NY           A                       5
2018     Jan           NY           B                       9
2018     Jan           NY           C                       7
2018     Jan           NY           D                      17
2018     Jan           DC           A                      15
2018     Jan           DC           B                      19
2018     Jan           DC           C                      17
2018     Jan           DC           D                      19
2018     Feb           Dallas      A                      11
2018     Feb           Dallas      B                      14
2018     Feb           Dallas      C                      16
2018     Feb           Dallas      D                      21
2018     Feb           NY           A                       6
2018     Feb           NY           B                       10
2018     Feb           NY           C                       8
2018     Feb           NY           D                      18
2018     Feb           DC           A                      16
2018     Feb           DC           B                      20
2018     Feb           DC           C                      18
2018     Feb           DC           D                      20
;
run;



data have2;
	set have;
	if 		Office in ("Dallas","NY") and Sales_type in ("A","B") then Dept=1;
	else if Office = "Dallas"		  and Sales_type in ("C","D") then Dept=2;
	else if Office in ("NY", "DC") 	  and Sales_type in ("C","D") then Dept=3;
	else if Office="DC"			      and Sales_type in ("A","B") then Dept=4;
run;

proc sort data=have2 (drop= Office Sales_type);
	by Year Month Dept;
run;

proc format fmtlib;
	value Dept (multilabel)
		1 = "Tim"
		2 = "Sam"
		3 = "Henry"
		4 = "Rick"
		1,4 = "Chuck";
run;

proc means data=have2 sum maxdec=0;
	var Sales;
	class Year Month Dept / mlf;
	ways 3; 
	output out=want (drop=_type_ _freq_) sum=Sales;
	format Dept Dept.;
run;

How do I summarizing data with conditional statement and group them by a new column

Re: How do I summarizing data with conditional statement and group them by a new column

Re: How do I summarizing data with conditional statement and group them by a new column

Re: How do I summarizing data with conditional statement and group them by a new column

Re: How do I summarizing data with conditional statement and group them by a new column

Re: How do I summarizing data with conditional statement and group them by a new column

Re: How do I summarizing data with conditional statement and group them by a new column

Classroom Training Available!