09-24-2015 10:46 AM
As easy as sounds. Seems like a proc sql, but I would also be interested in see data step option.
09-24-2015 10:56 AM
This is my way too long code so far...
PROC MEANS NOPRINT DATA=meta;
OUTPUT OUT=summarydata SUM(A) = sigma_A;
IF _N_=1 THEN SET summarydata;
drop _type_ _freq_;
proc print data=meta_summary;
09-24-2015 11:00 AM
On the post you want to edit, click on the "..." in the upper right side and select "edit post/reply."
09-24-2015 10:59 AM
09-24-2015 11:09 AM
the approach works astounding, but I don't know if it will work for me. In that I will have many more columns to sum and other data steps within this one.
09-24-2015 11:12 AM
some reason I thought you could insert the sum back into the dataset within the proc means statement I used. It would use an "in" I believe.
Any body familiar with that approach?
09-24-2015 12:13 PM
PROC SQL solution:
proc sql; create table B as select *, sum(a) from have; quit;
I don't think you can add the obs back in with proc means, there may be a way with proc summary though, but I'm unfamiliar with that procedure.
09-24-2015 02:23 PM - edited 09-24-2015 04:36 PM
How do you name the new variable in proc sql? It currently gets named "_TEMG001".
Say I want to call it Sigma_A.
10-05-2015 11:56 AM
I am using the following to sum two different variables (i.e., A and C):
create table B as
select *, sum(A)as Sigma_A,
But now I would like to sum them based on a group variable in the dataset called replicate. There are 100 replicate groups (i.e., 1-100) all with 300 observations. I would like to execute the about code but have the sums be for the replicates, and inserted into the new dataset as before.
Any help would be appreciated - I am currently having difficulties getting the "group by" to work with the multiple sums in the step.
10-05-2015 12:18 PM - edited 10-05-2015 12:20 PM
I am now using the following, which works out better for my needs (you can see I am looking at 4 variable sums):
proc means data=summies;
var A B C D;
output out=LR_test sum(A)= A_sum
However, the generated dataset has an extra row for the totals. So it has a replicate = "." with the totals. Plus the 100 other rows with the sums. Is there a way to get rid of this extra within the above data step?
10-05-2015 07:15 PM
Almost all of what you ask for is relatively easy to program. But you will need to set a fixed target, not a moving target. Many replicates? No problem. But a different problem. Name the new fields with "Sigma_"? No problem. But a different problem.
One thing you will have to think through is the length of the new variable names. With an original name like "A", there's no problem creating "Sigma_A". But what if the original variable name were 30 characters long? Now there's no room to put "Sigma_" in front.
Anyway, spell out a final form to the problem, and the solution won't be that difficult.