DATA Step, Macro, Functions and more

How sum selected rows for given dates in a panel data-set

Accepted Solution Solved
Reply
Contributor
Posts: 50
Accepted Solution

How sum selected rows for given dates in a panel data-set

Hi,

I have a dataset with annual sales of selected companies. I want to account for merges&acquisitions that took place and aggregate histrorical (pre-merge) data for both companies involved in a merge.

 

For example, I have a following dataset with 5 companies A-E. There were two merges:
1. in year 3, company B took over company A (merge No. 1)
2. in year 4, company C took over company D (merge No. 2)

 

data have;
input year company $1. takeover :$1. sales;
datalines;
1 A . 100
1 B . 120
1 C . 130
1 D . 140
1 E . 150
2 A . 105
2 B . 125
2 C . 135
2 D . 145
2 E . 155
3 B A 240
3 C . 140
3 D . 150
3 E . 160
4 B . 245
4 C D 300
4 E . 170
;
run;

proc sgplot data=have;
 series x=year y=sales/group=company;
run;

I want to get rid of the structural breaks in data for companies B and C. To do this, I want to sum sales of both companies invloved in each merge before it took place. In other words, I want to:
1. Sum the sales of A and B in each year 1 and 2
2. Sum the sales of C and D in each year 1,2 and 3

 

Can you advice me how to do it in SAS?

 

I was thinking of creating another variable merge_number that would equal to:
- 1 for companies A and B in year 1 and 2 (to indicate the companies involved in merge No. 1)
- 2 for companies C and D in year 1,2 and 3 (to indicate the companies involved in merge No. 2)
- 0 otherwise


Then I could use:

proc sql;
 create table want as
 select merge_number, sum(sales) as new_sales
 from have
 group by merge_number;
quit;


The problem is that I don't know how to create a variable merge_number. Could you help? Or maybe there is some other (simpler) way of summing the sales?


Accepted Solutions
Solution
‎04-04-2017 11:20 AM
Super User
Super User
Posts: 7,392

Re: How sum selected rows for given dates in a panel data-set

Right, got SAS back, this should work:

data have;
input year company $1. takeover :$1. sales;
datalines;
1 A . 100
1 B . 120
1 C . 130
1 D . 140
1 E . 150
2 A . 105
2 B . 125
2 C . 135
2 D . 145
2 E . 155
3 B A 240
3 C . 140
3 D . 150
3 E . 160
4 B . 245
4 C D 300
4 E . 170
;
run;

proc sql;
  create table WANT as
  select  A.YEAR,
          case when B.CMP is not null then B.CMP else A.COMPANY end as COMPANY,
          A.SALES
  from    HAVE A
  left join ( 
              select  YEAR,
                      CATS(COMPANY,TAKEOVER) as CMP,
                      TAKEOVER,
                      SALES
               from   HAVE
               where  TAKEOVER is not null
            ) B
  on      index(B.CMP,A.COMPANY)
  order by YEAR,
           COMPANY;
quit;

View solution in original post


All Replies
Occasional Contributor
Posts: 7

Re: How sum selected rows for given dates in a panel data-set

If i have undeerstood your scenario,if below code can help you

 

proc sql;
select year,company,sum(sales) as sale_amt
from have
group by company,year
order by company,year;
quit;

 

 

Contributor
Posts: 50

Re: How sum selected rows for given dates in a panel data-set

No, in this way I would get the exactly the same dataset. I want to sum sales of companies A and B, C and D only
Occasional Contributor
Posts: 7

Re: How sum selected rows for given dates in a panel data-set

You can try with case statements  in proc sql for creating a merge_number  variable

Contributor
Posts: 50

Re: How sum selected rows for given dates in a panel data-set

How exactly?
Frequent Contributor
Posts: 103

Re: How sum selected rows for given dates in a panel data-set

[ Edited ]

Hi,

 

try this

data want;
   length sumSales 8;
   set have;
   call missing(sumSales);
   if not missing(takeover) then do;
      call execute(compbl('
      PROC SQL;
         UPDATE want 
         SET sumSales=(SELECT sum(sales) 
                       FROM have 
                       WHERE company in ("'||strip(company)||'","'||strip(takeover)||'")
                       AND year le '||strip(put(year,best.))||'
                       ) 
         WHERE company eq "'||strip(company)||'" AND takeover eq "'||strip(takeover)||'";
      QUIT;
      '));
      
   end;
run;

 

________________________

- That still only counts as one -

Super User
Super User
Posts: 7,392

Re: How sum selected rows for given dates in a panel data-set

My SAS has died at the moment, so this is just a guess, but could you not just replace the company based on the smaller dataset?

E.g.;

proc sql;
  create table WANT as 
  select  A.YEAR,
          case when B.COMPANY is not null then B.COMPANY else A.COMPANY end as COMPANY,
          A.TAKEOVER,
          A.SALES
   from   HAVE A
   left join HAVE B
   on     A.TAKEOVER=B.COMPANY
   and    A.YEAR < B.YEAR;
quit;
Contributor
Posts: 50

Re: How sum selected rows for given dates in a panel data-set

@RW9

@Oligolas

 

Thanks for your answers, but they don't produce what I need.

 

I would like to have a following dataset at the end:

 

data want;
	input Year new_company :$2. new_sales ;
	datalines;
1 AB 220
1 CD 270
1 E 150
2 AB 230
2 CD 280
2 E 155
3 AB 240
3 CD 290
3 E 160
4 AB 245
4 CD 300
4 E 170
;
run;

where AB means sales of synthetic company combined of A and B in each year (e.g. 100+120=220) and CD - joint sales of C and D (e.g. 270=130+140). Of course, starting from year3, sales of AB=sales of B and in year4 sales of CD=sales of C as companies formally merged. 

 

 

Super User
Super User
Posts: 7,392

Re: How sum selected rows for given dates in a panel data-set

Hi,


Well, I still dont have SAS to try this out but my idea should still work.  The idea is to combine data from rows with takeover, with all the other companies in that pair, and creating a code from that:

 

proc sql;
  create table WANT as 
  select  A.YEAR,
          case when B.COMPANY is not null then cats(A.COMPANY,B.COMPANY) else A.COMPANY end as COMPANY,
          A.TAKEOVER,
          A.SALES
   from   HAVE A
   left join HAVE B
   on     A.TAKEOVER=B.COMPANY
   and    A.YEAR < B.YEAR;
quit;

What the idea is to get your original data to look something like:

1 AD . 100
1 B . 120
1 CD . 130
1 CD . 140
1 E . 150
2 AB . 105
2 B . 125
2 CD . 135
2 CD . 145
2 E . 155
3 AB A 240
3 CD . 140
3 CD . 150
3 E . 160
4 B . 245
4 CD D 300
4 E . 170

Which you can then sum up as normal.

Contributor
Posts: 50

Re: How sum selected rows for given dates in a panel data-set

@RW9

 

Ok, I think I get the idea now.The problem is that with the code I don't get the "new" company name (the one that "cats(A.COMPANY,B.COMPANY)" should produce) - see the attached file. Any idea why?


results.jpg
Solution
‎04-04-2017 11:20 AM
Super User
Super User
Posts: 7,392

Re: How sum selected rows for given dates in a panel data-set

Right, got SAS back, this should work:

data have;
input year company $1. takeover :$1. sales;
datalines;
1 A . 100
1 B . 120
1 C . 130
1 D . 140
1 E . 150
2 A . 105
2 B . 125
2 C . 135
2 D . 145
2 E . 155
3 B A 240
3 C . 140
3 D . 150
3 E . 160
4 B . 245
4 C D 300
4 E . 170
;
run;

proc sql;
  create table WANT as
  select  A.YEAR,
          case when B.CMP is not null then B.CMP else A.COMPANY end as COMPANY,
          A.SALES
  from    HAVE A
  left join ( 
              select  YEAR,
                      CATS(COMPANY,TAKEOVER) as CMP,
                      TAKEOVER,
                      SALES
               from   HAVE
               where  TAKEOVER is not null
            ) B
  on      index(B.CMP,A.COMPANY)
  order by YEAR,
           COMPANY;
quit;
Contributor
Posts: 50

Re: How sum selected rows for given dates in a panel data-set

@RW9 Thanks, now it works.

@kaverisonu1 Thanks, I guess that's another way of doing it, though some fine-tuning would be neccesary, as the output dataset is still a bit different from the one I need.

Super User
Posts: 10,483

Re: How sum selected rows for given dates in a panel data-set

Since you already know the problem cases perhaps this alternative:

 

proc format library=work;
value $NewComp
'A','B' = 'BA'
'C','D' = 'CD'
;
run;

data want;
  set have;
  NewCompany=put(company,$NewComp.);
run;

I often find that Formats are good for mapping /combining changed coded or names.

 

 

With the format you may not even need a new data set but apply the format to the variable in the summary or display procedure that does grouping.

 

Such as

proc summary data=have;

   class company;

   format company $newcomp. ;

   var sales;

run;

Occasional Contributor
Posts: 7

Re: How sum selected rows for given dates in a panel data-set

You can use the merge column post the code to sum the sales accordingly.

 


data have;
input year company $1. takeover :$1. sales;
datalines;
1 A . 100
1 B . 120
1 C . 130
1 D . 140
1 E . 150
2 A . 105
2 B . 125
2 C . 135
2 D . 145
2 E . 155
3 B A 240
3 C . 140
3 D . 150
3 E . 160
4 B . 245
4 C D 300
4 E . 170
;
run;


data have1;
set have;
select;
when(year=1 and company in('A','B'))do;
merge="AandB in 1";
end;
when(year=1 and company in ('C','D'))do;
merge="CandD in 2";
end;
when(year=1 and company="E") do;
merge="E in 1";
end;
when(year=2 and company in('A','B'))do;
merge="AandB in 2";
end;
when(year=2 and company in ('C','D'))do;
merge="CandD in 2";
end;
when(year=2 and company='E')do;
merge="E in 2";
end;
otherwise;
end;

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 13 replies
  • 167 views
  • 0 likes
  • 5 in conversation