DATA Step, Macro, Functions and more

Creating rows in a data set based on sums of existing variables

Accepted Solution Solved
Reply
Contributor
Posts: 23
Accepted Solution

Creating rows in a data set based on sums of existing variables

I need to write code create rows inside an existing dataset based on groupings. I have the following existing dataset:

 

YearAreaProfit
2001A1
2002A2
2001B1
2001C3
2002C1
2001E4
2002E2
2001F3
2002F4

 

I need to add in more rows corresponding to two more areas. Area D consists of A,B,C and Area G consists of E and F. I need to sum over the groups and years and add them into the existing dataset so it looks as such:

 

YearAreaProfit
2001A1
2002A2
2001B1
2001C3
2002C1
2001E4
2002E2
2001F3
2002F4
2001D5
2002D3
2001G7
2002G6

 

 

 


Accepted Solutions
Solution
2 weeks ago
Super User
Posts: 23,342

Re: Creating rows in a data set based on sums of existing variables

Try MultiLabel Formats with PROC MEANS. 

 

data have;
input Year    Area $    Profit;
cards;
2001    A   1
2002    A   2
2001    B   1
2001    C   3
2002    C   1
2001    E   4
2002    E   2
2001    F   3
2002    F   4
;;;;
run;

proc format;
value $ area_fmt (multilabel)
'A' = 'A'
'B' = 'B'
'C' = 'C'
'A', 'B', 'C' = 'D'
'E' = 'E'
'F' = 'F'
'E', 'F' = 'G';
run;

proc means data=have noprint nway;
class year area / mlf;
format area $area_fmt.;
var profit;
output out=want sum(profit)=profit;
run;



View solution in original post


All Replies
Super User
Posts: 13,358

Re: Creating rows in a data set based on sums of existing variables

I generally discourage have summaries inside data as you may end up getting things counted twice: sum of profit for the entire data set.

How will you be using that data set with the summaries included?

Contributor
Posts: 23

Re: Creating rows in a data set based on sums of existing variables

It will be used to display graphs on profit. No additional calculations will be made

Solution
2 weeks ago
Super User
Posts: 23,342

Re: Creating rows in a data set based on sums of existing variables

Try MultiLabel Formats with PROC MEANS. 

 

data have;
input Year    Area $    Profit;
cards;
2001    A   1
2002    A   2
2001    B   1
2001    C   3
2002    C   1
2001    E   4
2002    E   2
2001    F   3
2002    F   4
;;;;
run;

proc format;
value $ area_fmt (multilabel)
'A' = 'A'
'B' = 'B'
'C' = 'C'
'A', 'B', 'C' = 'D'
'E' = 'E'
'F' = 'F'
'E', 'F' = 'G';
run;

proc means data=have noprint nway;
class year area / mlf;
format area $area_fmt.;
var profit;
output out=want sum(profit)=profit;
run;



☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 100 views
  • 2 likes
  • 3 in conversation