I would like to sum values for a set of observations, for a specific column based on a specific identifier. For example, suppose i have data like the below
A 4 5 6
B 3 3 2
A 3 4 2
C 3 2 0
B 3 7 3
B 2 4 1
Suppose i want to sum all of the values by the identifier in column 1, so i have totals for A, B and C by a specific column of my choice (2, 3 or 4) Then in a separate data set just output either A, B or C that is beyond a certain criteria.
For example, i want the sums of column 4 (based on identifier in column 1) which are above the value of 1, my output data set should return A = 8, B = 6, and obviously nothing for C as its zero.
I am open to Proc SQL, or the data step, in fact would be useful to know both methods.
Thanks in advance.
Both solutions above will drop the Var1=C case. If you want to see it in the result, try one of these:
data have;
input var1 $ var2 var3 var4;
datalines;
A 4 5 6
B 3 3 2
A 3 4 2
C 3 2 0
B 3 7 3
B 2 4 1
;
proc sql;
create table want as
select var1, sum(case when var4>1 then var4 else 0 end) as totalVar4
from have
group by var1;
select * from want;
quit;
proc sort data=have; by var1; run;
data want2;
totalVar4 = 0;
do until(last.var1);
set have; by var1;
if var4>1 then totalVar4 + var4;
end;
keep var1 totalVar4;
run;
proc print data=want2 noobs; run;
PG
Use proc means or summary.
Proc means data= have sum;
where var4>1;
class var1;
var var4;
output out=want sum=;
run;
Using Proc SQL
data have;
input var1 $ var2 var3 var4;
datalines;
A 4 5 6
B 3 3 2
A 3 4 2
C 3 2 0
B 3 7 3
B 2 4 1
;
proc sql;
create table want as
select var1,sum(var4) as total_var4
from have
where var4>1
group by var1;
quit;
Both solutions above will drop the Var1=C case. If you want to see it in the result, try one of these:
data have;
input var1 $ var2 var3 var4;
datalines;
A 4 5 6
B 3 3 2
A 3 4 2
C 3 2 0
B 3 7 3
B 2 4 1
;
proc sql;
create table want as
select var1, sum(case when var4>1 then var4 else 0 end) as totalVar4
from have
group by var1;
select * from want;
quit;
proc sort data=have; by var1; run;
data want2;
totalVar4 = 0;
do until(last.var1);
set have; by var1;
if var4>1 then totalVar4 + var4;
end;
keep var1 totalVar4;
run;
proc print data=want2 noobs; run;
PG
Hi,
The below gives you 1 summary row from the data. Cnts1-3 is the sum for each a,b,c for column1, cnts4-6 is the sum for each column2, and same for column3. Could do the id in several ways, format for instance and maybe use a macro variable for the 3, so to allow any number of columns.
data have;
attrib var1 format=$1. var2-var4 format=best.;
infile datalines;
input var1 $ var2-var4;
datalines;
A 4 5 6
B 3 3 2
A 3 4 2
C 3 2 0
B 3 7 3
B 2 4 1
;
run;
data want (drop=id i);
set have end=last;
select(var1);
when ('A') id=1;
when ('B') id=2;
when ('C') id=3;
end;
retain cnts1-cnts9;
array cols{3} var2-var4;
array cnts{3,3} 8.;
do i=1 to 3;
cnts{id,i}=sum(cnts{id,i},cols{i});
end;
if last then output;
run;
hi all, a note to say thanks for your help and time, appreciated!
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.