Hi all,
I have a question on SAS Programming. It is about conditional sum. But it is very specific for me. Therefore, I want to ask as an example. I have the following dataset:
Group A Quantity
1 10 7
1 8 4
1 7 3
1 10 5
2 11 6
2 13 8
2 9 7
2 13 9
I want to add two more columns to this dataset. The new dataset should be:
Group A Quantity B NewColumn
1 10 7 10 12 (7+5)
1 8 4 10 12
1 7 3 10 12
1 10 5 10 12
2 13 6 13 15 (6+9)
2 10 8 13 15
2 9 7 13 15
2 13 9 13 15
So, the column B should be equal tha maximum value of each group and it is the same for all observations of each group. In this example, Group number 1 has 4 values. They are 10, 8, 7, 10. The maximum among these values is 10. Therefore, the values of the observations of the B column for the first group are all equal to 10. Maximum number for group number 2 is 13. Therefore, the values of the observations of the B column for the second group are all equal to 13.
The column C is more complicated. Its value depends on the all columns. Similiar to B column, it will be the same within group. More detailed, it is the sum of the specific observations of QUANTITIES column. These specific observations should belong to the observations that have the maximum value in each group. In our example, it is 12 for the first group. The reason is, the maximum number of first group is 10. and the quantities belong to 10 are 7 and 5. So, the sum of these is 12. For the second group it is 15. because the maximum value of the second group is 13 and the quantities belong to 13 are 6 and 9. So the sum is 15.
I hope. I can explain it. Many thanks in advance.
Great explanation and nice presentation
data have;
input Group A Quantity;
cards4;
1 10 7
1 8 4
1 7 3
1 10 5
2 11 6
2 13 8
2 9 7
2 13 9
;;;;
run;quit;
data want;
retain group a quantity;
keep group a quantity max sum;
retain max beg end .;
do until(last.group);
set have;
by group;
if first.group then beg=quantity;
if a > max then max=a;
if last.group then do;
end=quantity;
sum=beg+end;
end;
end;
do until(last.group);
set have;
by group;
output;
end;
call missing(max, beg, end);
run;quit;
Up to 40 obs WORK.WANT total obs=8
Obs GROUP A QUANTITY MAX SUM
1 1 10 7 10 12
2 1 8 4 10 12
3 1 7 3 10 12
4 1 10 5 10 12
5 2 11 6 13 15
6 2 13 8 13 15
7 2 9 7 13 15
8 2 13 9 13 15
Great explanation and nice presentation
data have;
input Group A Quantity;
cards4;
1 10 7
1 8 4
1 7 3
1 10 5
2 11 6
2 13 8
2 9 7
2 13 9
;;;;
run;quit;
data want;
retain group a quantity;
keep group a quantity max sum;
retain max beg end .;
do until(last.group);
set have;
by group;
if first.group then beg=quantity;
if a > max then max=a;
if last.group then do;
end=quantity;
sum=beg+end;
end;
end;
do until(last.group);
set have;
by group;
output;
end;
call missing(max, beg, end);
run;quit;
Up to 40 obs WORK.WANT total obs=8
Obs GROUP A QUANTITY MAX SUM
1 1 10 7 10 12
2 1 8 4 10 12
3 1 7 3 10 12
4 1 10 5 10 12
5 2 11 6 13 15
6 2 13 8 13 15
7 2 9 7 13 15
8 2 13 9 13 15
Amazing. Many thanks.
Just of note: @rogerjdeangelis's code will produce the wrong answer if there is a 3 or more way tie for the maximum value. e.g.:
data have; input Group A Quantity _B _NewColumn; cards; 1 10 7 10 13 (7+1+5) 1 10 1 10 13 (7+1+5) 1 8 4 10 13 1 7 3 10 13 1 10 5 10 12 2 13 6 13 15 (6+9) 2 10 8 13 15 2 9 7 13 15 2 13 9 13 15 ;
As such, I prefer to use proc sql for such calculations. The following can be written so that everything is nested within one call (i.e., not create a temporary file and then work from it), but I don't think either is more efficient than the other:
proc sql; create table need as select *, max(A) as B, (calculated B=A)*quantity as __NewColumn from have group by Group ; create table want (drop=_:) as select *, sum(__NewColumn) as NewColumn from need group by Group ; quit;
Art, CEO, AnalystFinder.com
Your description of the problem is far clearer than 90% of what gets posted here. Here's a possible solution using SQL:
data have;
input Group A Quantity;
datalines;
1 10 7
1 8 4
1 7 3
1 10 5
2 11 6
2 13 8
2 9 7
2 13 9
;
proc sql;
CREATE TABLE want AS
SELECT a.group, a.a, a.quantity, max(a.a) AS b,
sum(a.quantity * (a.a = b.max_a)) AS new_column
FROM have AS a
LEFT JOIN (SELECT group, max(a) AS max_a
FROM have
GROUP BY group) AS b
ON a.group = b.group
GROUP BY a.group;
quit;
I guess I read the problem wrong, I thought the op wanted the first + last as sum
not the associated with values..
Table and dataset are the same thing, really. Table is SQL nomenclature and dataset is SAS. The code that I've sent you is creating a SAS dataset. If you're querying data in Oracle, for example, there might be some details based on how you're connecting, but the syntax should work.
This is untested (I'm away from sas) and will probably have mistakes, but a triple DoW loop should work. One pass to get the max per group, second pass to get the sum, third pass to output the records. Something like:
data want ;
do until(last.group) ;
by group ;
set have ;
B=max(A,B) ;
end ;
do until(last.group) ;
set have ;
by group ;
if A = B then NewColumn = sum(NewColumn, Quantity) ;
end;
do until(last.group);
set have ;
by group;
output ;
end ;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.