SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Insert a new variable into dataset, which is the sum total of another variable already in dataset

Reply
Frequent Contributor
Frequent Contributor
Posts: 109

Insert a new variable into dataset, which is the sum total of another variable already in dataset

As easy as sounds.  Seems like a proc sql, but I would also be interested in see data step option.

 

data meta;

input A;

 

datalines;

12

8

14

25

8

16

;

 

 

Results

A   B

12    83

8      83

14    83

25    83

8      83

16    83

 

Frequent Contributor
Frequent Contributor
Posts: 109

Re: Insert a new variable into dataset, which is the sum total of another variable already in datase

This is my way too long code so far...

 

PROC MEANS NOPRINT DATA=meta;

VAR A;

OUTPUT OUT=summarydata SUM(A) = sigma_A;

DATA meta_summary;

IF _N_=1 THEN SET summarydata;

SET meta;

drop _type_ _freq_;

run;

proc print data=meta_summary;

run;

Frequent Contributor
Frequent Contributor
Posts: 109

Re: Insert a new variable into dataset, which is the sum total of another variable already in datase

PS, I no longer see where to click to edit my posts?
Community Manager
Posts: 486

Re: Insert a new variable into dataset, which is the sum total of another variable already in datase

Hi H,

 

On the post you want to edit, click on the "..." in the upper right side and select "edit post/reply."

 

Anna

Super User
Posts: 5,091

Re: Insert a new variable into dataset, which is the sum total of another variable already in datase

Here's a DATA step approach: data results; do until (done1); set meta end=done1; B + A; end; do until (done2); set meta end=done2; output; end; run; Good luck.
Frequent Contributor
Frequent Contributor
Posts: 109

Re: Insert a new variable into dataset, which is the sum total of another variable already in datase

the approach works astounding, but I don't know if it will work for me. In that I will have many more columns to sum and other data steps within this one.

Frequent Contributor
Frequent Contributor
Posts: 109

Re: Insert a new variable into dataset, which is the sum total of another variable already in datase

some reason I thought you could insert the sum back into the dataset within the proc means statement I used.  It would use an "in" I believe.

 

Any body familiar with that approach?

Super User
Posts: 17,899

Re: Insert a new variable into dataset, which is the sum total of another variable already in datase

PROC SQL solution:

 

proc sql;
create table B as
select *, sum(a)
from have;
quit;

I don't think you can add the obs back in with proc means, there may be a way with proc summary though, but I'm unfamiliar with that procedure.

Frequent Contributor
Frequent Contributor
Posts: 109

Re: Insert a new variable into dataset, which is the sum total of another variable already in datase

[ Edited ]

Reeza,

 

How do you name the new variable in proc sql?  It currently gets named "_TEMG001".

 

Say I want to call it Sigma_A.

 

Thanks.

Super User
Posts: 17,899

Re: Insert a new variable into dataset, which is the sum total of another variable already in datase

sum(A) as Sigma_A
Frequent Contributor
Frequent Contributor
Posts: 109

Re: Insert a new variable into dataset, which is the sum total of another variable already in datase

I am using the following to sum two different variables (i.e., A and C):

 

proc sql;

create table B as

select *, sum(A)as Sigma_A,

sum(C)as Sigma_C

from meta;

drop D;

quit;

 

But now I would like to sum them based on a group variable in the dataset called replicate. There are 100 replicate groups (i.e., 1-100) all with 300 observations. I would like to execute the about code but have the sums be for the replicates, and inserted into the new dataset as before.

 

Any help would be appreciated - I am currently having difficulties getting the "group by" to work with the multiple sums in the step.

Frequent Contributor
Frequent Contributor
Posts: 109

Re: Insert a new variable into dataset, which is the sum total of another variable already in datase

[ Edited ]

I am now using the following, which works out better for my needs (you can see I am looking at 4 variable sums):

 

proc means data=summies;

class replicate;

var A B C D;

output out=LR_test sum(A)= A_sum

sum(B)= B_sum

sum(C)= C_sum

sum(D)= D_sum;

run;

 

However, the generated dataset has an extra row for the totals. So it has a replicate  = "." with the totals. Plus the 100 other rows with the sums. Is there a way to get rid of this extra within the above data step?

Respected Advisor
Posts: 3,777

Re: Insert a new variable into dataset, which is the sum total of another variable already in datase

Use the PROC statement option NWAY.
Super User
Posts: 5,091

Re: Insert a new variable into dataset, which is the sum total of another variable already in datase

H,

 

Almost all of what you ask for is relatively easy to program.  But you will need to set a fixed target, not a moving target.  Many replicates?  No problem.  But a different problem.  Name the new fields with "Sigma_"?  No problem.  But a different problem.

 

One thing you will have to think through is the length of the new variable names.  With an original name like "A", there's no problem creating "Sigma_A".  But what if the original variable name were 30 characters long?  Now there's no room to put "Sigma_" in front.

 

Anyway, spell out a final form to the problem, and the solution won't be that difficult.

 

 

Ask a Question
Discussion stats
  • 13 replies
  • 882 views
  • 2 likes
  • 5 in conversation