Hi -
My current data looks like this:
id name cell1 cell2 cell3 .... cell10
1 a 1 1 1
1 a 1 1
2 b 1 1
2 b 1 1
I want to sum cell1-cell10 by id and by name, the ideal output is like this, when performing calculation, for the missing data,
missing + missing should be left as missing, and missing + 1 should be equal to 1:
id name cell1 cell2 cell3 ... cell10
1 a 2 1 1 1
2 b 1 2 1
I tried this approach first but it is really slow and make sas stops running:
proc means data =data;
var cell1-cell10;
output out = want (drop = _type_ _freq_)
sum(cell1-cell10) = cell1-cell10;
by id name;
run;
I think sql might be better but my current code does not work:
proc sql;
create table want as
select id, name, cell1-cell10, sum(cell1-cell10) as sum_cell1-sum_cell10
from data
group by id and name;
quit;
run;
Any idea?
Thanks!!!
SQL doesn't support variable lists the way you've used them.
Because of that you'll have to list each calculation independently such as:
sum(var1) as var1_sum,
sum(var2) as var2_sum,
...
sum(var99) as var99_sum
Proc Means will allow you to calculate the sum without this issue. Try adding the NOPRINT option to help speed it up. How big is your data? if it's really large other options are available but PROC MEANS is the best place to start IMO.
proc means data =data NOPRINT;
var cell1-cell10;
output out = want (drop = _type_ _freq_)
sum(cell1-cell10) = cell1-cell10;
by id name;
run;
Did you try a variation on the Means syntax:
proc means data =data nway; class id name; var cell1 - cell10; output out = want (drop = _type_ _freq_) sum = ; run;
Class will group the variables, the nway option works with the class variables to create only the combinations of the class variables with all the variables. If there is a single statistic you want for each VAR then just use the name of the statistic and = to maintain the original name.
I tried that approach but is still really slow, in my example the variables are from cell1 to cell10 but actually I have cell1-cell1440.
SQL doesn't support variable lists the way you've used them.
Because of that you'll have to list each calculation independently such as:
sum(var1) as var1_sum,
sum(var2) as var2_sum,
...
sum(var99) as var99_sum
Proc Means will allow you to calculate the sum without this issue. Try adding the NOPRINT option to help speed it up. How big is your data? if it's really large other options are available but PROC MEANS is the best place to start IMO.
proc means data =data NOPRINT;
var cell1-cell10;
output out = want (drop = _type_ _freq_)
sum(cell1-cell10) = cell1-cell10;
by id name;
run;
3000 observations is nothing, but 1440 variables is a lot.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.