Dear All,
I have the following dataset:
data have;
input ID1 ID2 V1 V2 V3;
datalines;
101 10001 2.1 1.7 3.1
101 10001 3.0 1.7 3.1
101 10002 5.9 2.3 8.5
102 10003 1.9 4.4 2.8
102 10003 1.0 4.4 2.8
102 10003 2.1 4.4 2.8
103 10004 3.7 5.0 7.4
;
I would like to sum the values of the variable V1 across rows only when records have same ID1 and ID2. That is, I would like to obtain what follows:
data want;
input ID1 ID2 V1 V2 V3;
datalines;
101 10001 5.1 1.7 3.1
101 10002 5.9 2.3 8.5
102 10003 5.0 4.4 2.8
103 10004 3.7 5.0 7.4
;
Notice that the values of V1 in want are given by: 5.1 = 2.1+3.0 and 5.0=1.9+1.0+2.1.
Any help would be highly appreciated.
Hi,
Actually you could do:
proc sql;
create table WANT as
select distinct
ID1,
ID2,
sum(V1) as V1,
sum(V2) as V2,
sum(V3) as V3
from HAVE
group by ID1||ID2;
quit;
Well, one way with retain, have done this quickly, you could do by id1 id2, I combined as don't really have time to think about right now. you could also use arrays for sum.
data inter;
set have;
length tot $200;
run;
data want; /* Assumed sorted by id1 and id2 */
set inter;
by tot;
retain sum1-sum3;
if first.tot then do;
sum1=v1; sum2=v2; sum3=v3;
end;
else do;
sum1=sum1+v1;
sum2=sum2+v2;
sum3=sum3+v3;
end;
if last.tot then output;
run'
Hi,
Actually you could do:
proc sql;
create table WANT as
select distinct
ID1,
ID2,
sum(V1) as V1,
sum(V2) as V2,
sum(V3) as V3
from HAVE
group by ID1||ID2;
quit;
Well, one way with retain, have done this quickly, you could do by id1 id2, I combined as don't really have time to think about right now. you could also use arrays for sum.
data inter;
set have;
length tot $200;
run;
data want; /* Assumed sorted by id1 and id2 */
set inter;
by tot;
retain sum1-sum3;
if first.tot then do;
sum1=v1; sum2=v2; sum3=v3;
end;
else do;
sum1=sum1+v1;
sum2=sum2+v2;
sum3=sum3+v3;
end;
if last.tot then output;
run'
Thank you @RW9 for your help. I think there are some typos in your first version with PROC SQL. I'm posting a corrected version of your code:
proc sql;
create table WANT as
select distinct ID1, ID2, sum(V1) as V1, V2, V3
from HAVE
group by ID1, ID2;
quit;
Nope, no typos there. You want three variables out V1 = sum of V1, V2 = sum of V2, V3 = sum of V3.
You do this by:
sum(variable) as new_variable.
So each one needs specifying, per my original SQL.
I want 3 variables out, but I want to sum only with respect to variable V1 across rows. Your code is also summing V2 and V3 across rows.
Mark,
Don't piss at RW9 , because your question is really not easy .
data have; input ID1 ID2 V1 V2 V3; datalines; 101 10001 2.1 1.7 3.1 101 10001 3.0 1.7 3.1 101 10002 5.9 2.3 8.5 102 10003 1.9 4.4 2.8 102 10003 1.0 4.4 2.8 102 10003 2.1 4.4 2.8 103 10004 3.7 5.0 7.4 ; run; proc sql; create table want as select id1,id2, case when range(v1)=0 then avg(v1) else sum(v1) end as v1, case when range(v2)=0 then avg(v2) else sum(v2) end as v2, case when range(v3)=0 then avg(v3) else sum(v3) end as v3 from have group by id1,id2; quit;
Xia Keshan
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.