BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mark_ph
Calcite | Level 5

Dear All,

I have the following dataset:

data have;

     input ID1 ID2 V1 V2 V3;

datalines;

101     10001     2.1     1.7     3.1

101     10001     3.0     1.7     3.1

101     10002     5.9     2.3     8.5

102     10003     1.9     4.4     2.8

102     10003     1.0     4.4     2.8

102     10003     2.1     4.4     2.8

103     10004     3.7     5.0     7.4

;

I would like to sum the values of the variable V1 across rows only when records have same ID1 and ID2. That is, I would like to obtain what follows:


data want;

     input ID1 ID2 V1 V2 V3;

datalines;

101     10001     5.1     1.7     3.1

101     10002     5.9     2.3     8.5

102     10003     5.0     4.4     2.8

103     10004     3.7     5.0     7.4

;

Notice that the values of V1 in want are given by: 5.1 = 2.1+3.0 and 5.0=1.9+1.0+2.1.


Any help would be highly appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

Actually you could do:

proc sql;

     create table WANT as

     select     distinct

                    ID1,

                    ID2,

                    sum(V1) as V1,

                    sum(V2) as V2,

                    sum(V3) as V3

     from        HAVE

     group by ID1||ID2;

quit;

Well, one way with retain, have done this quickly, you could do by id1 id2, I combined as don't really have time to think about right now.  you could also use arrays for sum.

data inter;

     set have;

     length tot $200;

run;

data want; /* Assumed sorted by id1 and id2 */

     set inter;

     by tot;

     retain sum1-sum3;

     if first.tot then do;

          sum1=v1; sum2=v2; sum3=v3;

     end;

     else do;

          sum1=sum1+v1;

          sum2=sum2+v2;

          sum3=sum3+v3;

     end;

     if last.tot then output;

run'

View solution in original post

6 REPLIES 6
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

Actually you could do:

proc sql;

     create table WANT as

     select     distinct

                    ID1,

                    ID2,

                    sum(V1) as V1,

                    sum(V2) as V2,

                    sum(V3) as V3

     from        HAVE

     group by ID1||ID2;

quit;

Well, one way with retain, have done this quickly, you could do by id1 id2, I combined as don't really have time to think about right now.  you could also use arrays for sum.

data inter;

     set have;

     length tot $200;

run;

data want; /* Assumed sorted by id1 and id2 */

     set inter;

     by tot;

     retain sum1-sum3;

     if first.tot then do;

          sum1=v1; sum2=v2; sum3=v3;

     end;

     else do;

          sum1=sum1+v1;

          sum2=sum2+v2;

          sum3=sum3+v3;

     end;

     if last.tot then output;

run'

mark_ph
Calcite | Level 5

Thank you @RW9 for your help. I think there are some typos in your first version with PROC SQL. I'm posting a corrected version of your code:

proc sql;

     create table WANT as

     select     distinct ID1, ID2, sum(V1) as V1, V2, V3

     from        HAVE

     group by ID1, ID2;

quit;

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Nope, no typos there.  You want three variables out V1 = sum of V1, V2 = sum of V2, V3 = sum of V3.

You do this by:

sum(variable) as new_variable.

So each one needs specifying, per my original SQL.

mark_ph
Calcite | Level 5

I want 3 variables out, but I want to sum only with respect to variable V1 across rows. Your code is also summing V2 and V3 across rows.

Ksharp
Super User

Mark,

Don't piss at RW9 Smiley Happy, because your question is really not easy .

data have;
     input ID1 ID2 V1 V2 V3;
datalines;
101     10001     2.1     1.7     3.1
101     10001     3.0     1.7     3.1
101     10002     5.9     2.3     8.5
102     10003     1.9     4.4     2.8
102     10003     1.0     4.4     2.8
102     10003     2.1     4.4     2.8
103     10004     3.7     5.0     7.4
;
run;
proc sql;
create table want as
select id1,id2, case when range(v1)=0 then avg(v1) else sum(v1) end as v1,
                 case when range(v2)=0 then avg(v2) else sum(v2) end as v2,
                     case when range(v3)=0 then avg(v3) else sum(v3) end as v3
 from have
  group by id1,id2;
quit;

Xia Keshan

mark_ph
Calcite | Level 5

Dear , I'm not pissing at @RW9. I immediately thanked him for his help and selected his answer as the correct one. I just posted a different version of the code for future users. :smileygrin:

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 661 views
  • 3 likes
  • 3 in conversation