BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Lekhnath
Calcite | Level 5

Hi, I have a data which looks like this:

 

Year      Month     Office      Sales_type     Sales

2018     Jan           Dallas      A                      10

2018     Jan           Dallas      B                      13

2018     Jan           Dallas      C                      15

2018     Jan           Dallas      D                      20

2018     Jan           NY           A                       5

2018     Jan           NY           B                       9

2018     Jan           NY           C                       7

2018     Jan           NY           D                      17

2018     Jan           DC           A                      15

2018     Jan           DC           B                      19

2018     Jan           DC           C                      17

2018     Jan           DC           D                      19

2018     Feb           Dallas      A                      11

2018     Feb           Dallas      B                      14

2018     Feb           Dallas      C                      16

2018     Feb           Dallas      D                      21

2018     Feb           NY           A                       6

2018     Feb           NY           B                       10

2018     Feb           NY           C                       8

2018     Feb           NY           D                      18

2018     Feb           DC           A                      16

2018     Feb           DC           B                      20

2018     Feb           DC           C                      18

2018     Feb           DC           D                      20

 

If the office is Dallas and NY and the sales_type is A and B then it belongs to Tim's department

If the office is Dallas and the sales_type is C and D then it belongs to Sam's department

If the office is NY and DC and the sales_type is C and D then it belongs to Henry's department

If the office is DC and the sales_type is A and B then it belongs to Rick's department

 

Now I want the data to be summarized by year, month and department which would look like this:

Year      Month     Dept       Sales

2018      Jan         Tim          37

2018      Jan         Sam        35

2018      Jan         Henry      60

2018      Jan         Rick         34

2018      Feb         Tim          41

2018      Feb         Sam        37

2018      Feb         Henry      64

2018      Feb         Rick         36

2019      Jan         Tim         ----

2019      Jan         Sam        ----

-----       ------         ----         -----

 

Here is what I have tried:

PROC SQL;
   CREATE TABLE WORK.Sales_Summary AS 
   SELECT DISTINCT Month,
		(CASE WHEN Sales_type IN ('A','B') AND Office IN ('Dallas','NY') THEN SUM(Sales) END) as Tim,
		(CASE WHEN Sales_type IN ('C','D') AND Office IN ('Dallas') THEN SUM(Sales) END) as Sam,
                (CASE WHEN Sales_type IN ('C','D') AND Office IN ('NY','DC') THEN SUM(Sales) END) as Henry,	
                (CASE WHEN Sales_type IN ('A','B') AND Office IN ('DC') THEN SUM(Sales) END) as Rick,	
      FROM WORK.Sales_Data t1
             GROUP BY Year, Month;
QUIT;

After I get this I was hoping I could transpose the columns Tim, Sam, Henry and Rick. But the problem is I couldn't even get to summarize the data in those columns correctly.

 

Could you please suggest me the way to get this data summarized in this manner. I use SAS EG 7.15 for making queries. Thank you in Advance! 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ed_sas_member
Meteorite | Level 14

Hi @Lekhnath 

 

I have updated the code to take into account this new requirement.

In my opinion, the easiest way to handle "overlapping" departments is to use a specific format .

In the below code, I have defined the format Dept. as a multilabel one (for example, 1 can be formatted either as "Tim" or "Chuck").

Then, you can invoke this format in the proc means and specify the option "MLF" to get summary statistics for each format category.

 

Best, 

 

data have;
	infile datalines dlm=" ";
	input Year Month $ Office $ Sales_type $ Sales;
	datalines;
2018     Jan           Dallas      A                      10
2018     Jan           Dallas      B                      13
2018     Jan           Dallas      C                      15
2018     Jan           Dallas      D                      20
2018     Jan           NY           A                       5
2018     Jan           NY           B                       9
2018     Jan           NY           C                       7
2018     Jan           NY           D                      17
2018     Jan           DC           A                      15
2018     Jan           DC           B                      19
2018     Jan           DC           C                      17
2018     Jan           DC           D                      19
2018     Feb           Dallas      A                      11
2018     Feb           Dallas      B                      14
2018     Feb           Dallas      C                      16
2018     Feb           Dallas      D                      21
2018     Feb           NY           A                       6
2018     Feb           NY           B                       10
2018     Feb           NY           C                       8
2018     Feb           NY           D                      18
2018     Feb           DC           A                      16
2018     Feb           DC           B                      20
2018     Feb           DC           C                      18
2018     Feb           DC           D                      20
;
run;



data have2;
	set have;
	if 		Office in ("Dallas","NY") and Sales_type in ("A","B") then Dept=1;
	else if Office = "Dallas"		  and Sales_type in ("C","D") then Dept=2;
	else if Office in ("NY", "DC") 	  and Sales_type in ("C","D") then Dept=3;
	else if Office="DC"			      and Sales_type in ("A","B") then Dept=4;
run;

proc sort data=have2 (drop= Office Sales_type);
	by Year Month Dept;
run;

proc format fmtlib;
	value Dept (multilabel)
		1 = "Tim"
		2 = "Sam"
		3 = "Henry"
		4 = "Rick"
		1,4 = "Chuck";
run;

proc means data=have2 sum maxdec=0;
	var Sales;
	class Year Month Dept / mlf;
	ways 3; 
	output out=want (drop=_type_ _freq_) sum=Sales;
	format Dept Dept.;
run;

 

View solution in original post

5 REPLIES 5
ed_sas_member
Meteorite | Level 14

Hi @Lekhnath 

 

You can try the following code, using a proc means to get the summary statistics (sum of sales).

Ideally, if you want to display your table in a chronological order (years and months), I recommend that you define a format for months (e.g. 1 = "Jan", etc.)

Best,

data have;
	infile datalines dlm=" ";
	input Year Month $ Office $ Sales_type $ Sales;
	datalines;
2018     Jan           Dallas      A                      10
2018     Jan           Dallas      B                      13
2018     Jan           Dallas      C                      15
2018     Jan           Dallas      D                      20
2018     Jan           NY           A                       5
2018     Jan           NY           B                       9
2018     Jan           NY           C                       7
2018     Jan           NY           D                      17
2018     Jan           DC           A                      15
2018     Jan           DC           B                      19
2018     Jan           DC           C                      17
2018     Jan           DC           D                      19
2018     Feb           Dallas      A                      11
2018     Feb           Dallas      B                      14
2018     Feb           Dallas      C                      16
2018     Feb           Dallas      D                      21
2018     Feb           NY           A                       6
2018     Feb           NY           B                       10
2018     Feb           NY           C                       8
2018     Feb           NY           D                      18
2018     Feb           DC           A                      16
2018     Feb           DC           B                      20
2018     Feb           DC           C                      18
2018     Feb           DC           D                      20
;
run;



data have2;
	set have;
	length dept $10.;
	if 		Office in ("Dallas","NY") and Sales_type in ("A","B") then Dept="Tim";
	else if Office = "Dallas"		  and Sales_type in ("C","D") then Dept="Sam";
	else if Office in ("NY", "DC") 	  and Sales_type in ("C","D") then Dept="Henry";
	else if Office="DC"			      and Sales_type in ("A","B") then Dept="Rick";
run;

proc sort data=have2 (drop= Office Sales_type);
	by Year Month Dept;
run;

proc means data=have2 sum maxdec=0;
	var Sales;
	class Year Month Dept;
	ways 3; 
	output out=want (drop=_type_ _freq_) sum=Sales;
run;

 

ballardw
Super User

An alternate approach to consider with multiple nested If/then/else type coding is the use of the SELECT statement.

Consider:

data have;
	infile datalines dlm=" ";
	input Year Month $ Office $ Sales_type $ Sales;
	datalines;
2018     Jan           Dallas      A                      10
2018     Jan           Dallas      B                      13
2018     Jan           Dallas      C                      15
2018     Jan           Dallas      D                      20
2018     Jan           NY           A                       5
2018     Jan           NY           B                       9
2018     Jan           NY           C                       7
2018     Jan           NY           D                      17
2018     Jan           DC           A                      15
2018     Jan           DC           B                      19
2018     Jan           DC           C                      17
2018     Jan           DC           D                      19
2018     Feb           Dallas      A                      11
2018     Feb           Dallas      B                      14
2018     Feb           Dallas      C                      16
2018     Feb           Dallas      D                      21
2018     Feb           NY           A                       6
2018     Feb           NY           B                       10
2018     Feb           NY           C                       8
2018     Feb           NY           D                      18
2018     Feb           DC           A                      16
2018     Feb           DC           B                      20
2018     Feb           DC           C                      18
2018     Feb           DC           D                      20
;
run;

data work.have2;
	set have;
	length dept $10.;
	if 		Office in ("Dallas","NY") and Sales_type in ("A","B") then Dept="Tim";
	else if Office = "Dallas"		  and Sales_type in ("C","D") then Dept="Sam";
	else if Office in ("NY", "DC") 	  and Sales_type in ("C","D") then Dept="Henry";
	else if Office="DC"			      and Sales_type in ("A","B") then Dept="Rick";
run;

data work.have3;
	set have;
	length dept $10.;
   select (office);
      when ('Dallas') select (sales_type) ;
                        when ('A','B')  Dept='Tim';
                        when ('C','D')  Dept='Sam';
                        otherwise put "WARNING: Unexpected Sales type of " sales_type "for " office=;
                      end;
      when ('NY')     select (sales_type) ;
                        when ('A','B')  Dept='Tim';
                        when ('C','D')  Dept='Henry';
                        otherwise put "WARNING: Unexpected Sales type of " sales_type "for " office=;
                      end;

      when ('DC')     select (sales_type) ;
                        when ('A','B')  Dept='Rick';
                        when ('C','D')  Dept='Henry';
                        otherwise put "WARNING: Unexpected Sales type of " sales_type "for " office=;
                     end;

      otherwise put "WARNING: Unexpected office value of " office;
   end;
run;

 

This may take a few more lines of code but it is a bit easier to determine what is done for each office and each sales_type code within an office. The OTHERWISE statement in the select is what would get executed if nothing matches any of the WHEN values. In this case messages in the log of unexpected values. If there is any likelihood of the assignments changing frequently this may be easier to maintain.

Better might be a data set with the values of Office, Sales_type and  dept and do an sql join to bring in the dept. This will be a much better way if you have LOTS of offices and Sales_types.

 

data work.departments;
   input office $ sales_type $ dept $;
datalines;
Dallas A Tim
Dallas B Tim
Dallas C Sam
Dallas D Sam
NY     A Tim
NY     B Tim
NY     C Henry
NY     D Henry
DC     A Rick
DC     B Rick
DC     C Henry
DC     D Henry
run;

proc sql;
   create table work.have4 as
   select a.*, b.dept
   from work.have as a
        left join
        work.departments as b
        on a.office=b.office
        and a.sales_type=b.sales_type
   ;
quit;
Lekhnath
Calcite | Level 5

Hi @ballardw 

 

Thank you for your reply. I forgot one very important piece in the question, sorry about that. That is: There is another department "Chuck" which satisfies same criteria of Tim's department plus Rick's department. So, basically the result of Chuck's sales should be sum of Tim's and Rick's. But the data should still be summarized for all Tim, Sam, Henry, Rick and Chuck. Thank you.

 

Lekhnath
Calcite | Level 5

Hi @ed_sas_member 

 

Thank you for your reply. I forgot one very important piece in the question, sorry about that. That is: There is another department "Chuck" which satisfies same criteria of Tim's department plus Rick's department. So, basically the result of Chuck's sales should be sum of Tim's and Rick's. But the data should still be summarized for all Tim, Sam, Henry, Rick and Chuck.

 

Also, I would like the data as a data table rather than a report. Thank you.

ed_sas_member
Meteorite | Level 14

Hi @Lekhnath 

 

I have updated the code to take into account this new requirement.

In my opinion, the easiest way to handle "overlapping" departments is to use a specific format .

In the below code, I have defined the format Dept. as a multilabel one (for example, 1 can be formatted either as "Tim" or "Chuck").

Then, you can invoke this format in the proc means and specify the option "MLF" to get summary statistics for each format category.

 

Best, 

 

data have;
	infile datalines dlm=" ";
	input Year Month $ Office $ Sales_type $ Sales;
	datalines;
2018     Jan           Dallas      A                      10
2018     Jan           Dallas      B                      13
2018     Jan           Dallas      C                      15
2018     Jan           Dallas      D                      20
2018     Jan           NY           A                       5
2018     Jan           NY           B                       9
2018     Jan           NY           C                       7
2018     Jan           NY           D                      17
2018     Jan           DC           A                      15
2018     Jan           DC           B                      19
2018     Jan           DC           C                      17
2018     Jan           DC           D                      19
2018     Feb           Dallas      A                      11
2018     Feb           Dallas      B                      14
2018     Feb           Dallas      C                      16
2018     Feb           Dallas      D                      21
2018     Feb           NY           A                       6
2018     Feb           NY           B                       10
2018     Feb           NY           C                       8
2018     Feb           NY           D                      18
2018     Feb           DC           A                      16
2018     Feb           DC           B                      20
2018     Feb           DC           C                      18
2018     Feb           DC           D                      20
;
run;



data have2;
	set have;
	if 		Office in ("Dallas","NY") and Sales_type in ("A","B") then Dept=1;
	else if Office = "Dallas"		  and Sales_type in ("C","D") then Dept=2;
	else if Office in ("NY", "DC") 	  and Sales_type in ("C","D") then Dept=3;
	else if Office="DC"			      and Sales_type in ("A","B") then Dept=4;
run;

proc sort data=have2 (drop= Office Sales_type);
	by Year Month Dept;
run;

proc format fmtlib;
	value Dept (multilabel)
		1 = "Tim"
		2 = "Sam"
		3 = "Henry"
		4 = "Rick"
		1,4 = "Chuck";
run;

proc means data=have2 sum maxdec=0;
	var Sales;
	class Year Month Dept / mlf;
	ways 3; 
	output out=want (drop=_type_ _freq_) sum=Sales;
	format Dept Dept.;
run;

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1379 views
  • 2 likes
  • 3 in conversation