summing by group using SQL

Accepted Solution Solved
Reply
Contributor
Posts: 30
Accepted Solution

summing by group using SQL

Hi -

My current data looks like this:

id  name   cell1  cell2 cell3 .... cell10

1   a          1        1                      1

1   a          1               1

2   b                   1      1

2   b                           1                1

I want to sum cell1-cell10 by id and by name, the ideal output is like this, when performing calculation, for the missing data,

missing + missing should be left as missing, and missing + 1 should be equal to 1:

id  name   cell1  cell2  cell3 ... cell10

1   a          2        1        1              1

2   b                    1        2               1

I tried this approach first but it is really slow and make sas stops running:

proc means data =data;
   var cell1-cell10;
   output out = want (drop = _type_ _freq_)
   sum(cell1-cell10) = cell1-cell10;
   by id name;
run;

I think sql might be better but my current code does not work:

proc sql;
create table want as
select id, name, cell1-cell10, sum(cell1-cell10) as sum_cell1-sum_cell10
from data
group by id and name;
quit;
run;

Any idea?

 

Thanks!!!


Accepted Solutions
Solution
‎07-10-2017 12:45 PM
Super User
Posts: 19,133

Re: summing by group using SQL

SQL doesn't support variable lists the way you've used them. 

Because of that you'll have to list each calculation independently such as:

 

sum(var1) as var1_sum,
sum(var2) as var2_sum,
...


sum(var99) as var99_sum

Proc Means will allow you to calculate the sum without this issue. Try adding the NOPRINT option to help speed it up. How big is your data? if it's really large other options are available but PROC MEANS is the best place to start IMO.

 

proc means data =data NOPRINT;
   var cell1-cell10;
   output out = want (drop = _type_ _freq_)
   sum(cell1-cell10) = cell1-cell10;
   by id name;
run;

View solution in original post


All Replies
Super User
Posts: 11,128

Re: summing by group using SQL

Did you try a variation on the Means syntax:

 

proc means data =data nway;
   class id name;
   var cell1 - cell10;
   output out = want (drop = _type_ _freq_)
   sum =
   ;
run;

Class will group the variables, the nway option works with the class variables to create only the combinations of the class variables with all the variables. If there is a single statistic you want for each VAR then just use the name of the statistic and = to maintain the original name.

 

Contributor
Posts: 30

Re: summing by group using SQL

I tried that approach but is still really slow, in my example the variables are from cell1 to cell10 but actually I have cell1-cell1440. 

Solution
‎07-10-2017 12:45 PM
Super User
Posts: 19,133

Re: summing by group using SQL

SQL doesn't support variable lists the way you've used them. 

Because of that you'll have to list each calculation independently such as:

 

sum(var1) as var1_sum,
sum(var2) as var2_sum,
...


sum(var99) as var99_sum

Proc Means will allow you to calculate the sum without this issue. Try adding the NOPRINT option to help speed it up. How big is your data? if it's really large other options are available but PROC MEANS is the best place to start IMO.

 

proc means data =data NOPRINT;
   var cell1-cell10;
   output out = want (drop = _type_ _freq_)
   sum(cell1-cell10) = cell1-cell10;
   by id name;
run;
Contributor
Posts: 30

Re: summing by group using SQL

Thanks so much! I have more than 3,000 observations and 1441 variables.
After I add NOPRINT, it runs really quickly.
Super User
Posts: 19,133

Re: summing by group using SQL

3000 observations is nothing, but 1440 variables is a lot.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 143 views
  • 2 likes
  • 3 in conversation