Dear All,
I have the following dataset:
data have;
input ID1 ID2 V1 V2 V3;
datalines;
101 10001 2.1 1.7 3.1
101 10001 3.0 1.7 3.1
101 10002 5.9 2.3 8.5
102 10003 1.9 4.4 2.8
102 10003 1.0 4.4 2.8
102 10003 2.1 4.4 2.8
103 10004 3.7 5.0 7.4
;
I would like to sum the values of the variable V1 across rows only when records have same ID1 and ID2. That is, I would like to obtain what follows:
data want;
input ID1 ID2 V1 V2 V3;
datalines;
101 10001 5.1 1.7 3.1
101 10002 5.9 2.3 8.5
102 10003 5.0 4.4 2.8
103 10004 3.7 5.0 7.4
;
Notice that the values of V1 in want are given by: 5.1 = 2.1+3.0 and 5.0=1.9+1.0+2.1.
Any help would be highly appreciated.
Hi,
Actually you could do:
proc sql;
create table WANT as
select distinct
ID1,
ID2,
sum(V1) as V1,
sum(V2) as V2,
sum(V3) as V3
from HAVE
group by ID1||ID2;
quit;
Well, one way with retain, have done this quickly, you could do by id1 id2, I combined as don't really have time to think about right now. you could also use arrays for sum.
data inter;
set have;
length tot $200;
run;
data want; /* Assumed sorted by id1 and id2 */
set inter;
by tot;
retain sum1-sum3;
if first.tot then do;
sum1=v1; sum2=v2; sum3=v3;
end;
else do;
sum1=sum1+v1;
sum2=sum2+v2;
sum3=sum3+v3;
end;
if last.tot then output;
run'
Hi,
Actually you could do:
proc sql;
create table WANT as
select distinct
ID1,
ID2,
sum(V1) as V1,
sum(V2) as V2,
sum(V3) as V3
from HAVE
group by ID1||ID2;
quit;
Well, one way with retain, have done this quickly, you could do by id1 id2, I combined as don't really have time to think about right now. you could also use arrays for sum.
data inter;
set have;
length tot $200;
run;
data want; /* Assumed sorted by id1 and id2 */
set inter;
by tot;
retain sum1-sum3;
if first.tot then do;
sum1=v1; sum2=v2; sum3=v3;
end;
else do;
sum1=sum1+v1;
sum2=sum2+v2;
sum3=sum3+v3;
end;
if last.tot then output;
run'
Thank you @RW9 for your help. I think there are some typos in your first version with PROC SQL. I'm posting a corrected version of your code:
proc sql;
create table WANT as
select distinct ID1, ID2, sum(V1) as V1, V2, V3
from HAVE
group by ID1, ID2;
quit;
Nope, no typos there. You want three variables out V1 = sum of V1, V2 = sum of V2, V3 = sum of V3.
You do this by:
sum(variable) as new_variable.
So each one needs specifying, per my original SQL.
I want 3 variables out, but I want to sum only with respect to variable V1 across rows. Your code is also summing V2 and V3 across rows.
Mark,
Don't piss at RW9 , because your question is really not easy .
data have; input ID1 ID2 V1 V2 V3; datalines; 101 10001 2.1 1.7 3.1 101 10001 3.0 1.7 3.1 101 10002 5.9 2.3 8.5 102 10003 1.9 4.4 2.8 102 10003 1.0 4.4 2.8 102 10003 2.1 4.4 2.8 103 10004 3.7 5.0 7.4 ; run; proc sql; create table want as select id1,id2, case when range(v1)=0 then avg(v1) else sum(v1) end as v1, case when range(v2)=0 then avg(v2) else sum(v2) end as v2, case when range(v3)=0 then avg(v3) else sum(v3) end as v3 from have group by id1,id2; quit;
Xia Keshan
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.