BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Tom
Super User Tom
Super User

@dvtarasov wrote:
Thanks! I've experimented with this way of doing things and it seems to work for me. I am wondering, however: is it possible, for the OUTPUT statement of PROC MEANS, to produce, not a new dataset, but just a new variable that stores the sum?

Not sure what that means.  A variable is an attribute or question that has a value recorded for every observation. The sum of a column is not another variable, at least not at the same unit of measure as the original records in the source table.

Now if you want and easy method to generate a sum over a table (or a by group within a table) and then assign that sum as essentially a constant it is easier to do in PROC SQL.  In fact it is easier in SAS's PROC SQL than in most vendors SQL implementation.

 

proc sql;
   create new as
   select id
        , cost
        , sum(cost) as total_cost
        , cost/ calculated total_cost as percent_cost
  from have
  group by id
  ;
quit;
dvtarasov
Obsidian | Level 7

Yes, obtaining the sum over a column (not grouped by anything within the column) and then using that sum as, essentially, a constant in the next step is what I'm trying to do. 

 

Before trying this with my actual data, I experimented on a small "test" dataset which had two columnts, Var1 and Var2. I tried to obtain the sum ("total_cost") of Var1 and then write a new dataset, which would be like the old one but have an additinal column, Var3, that would hold the values of Var2 multiplied by "total_cost". The summing was done as follows:

proc sql;
select sum(var1) as total_cost
from sasdata.olddata;

 

This part worked; the expected sum was computed. By the next step, however, "total_cost" seems to have gone missing and could not be used in the calculation: 

data sasdata.newdata;
set sasdata.olddata;
Var3 = total_cost * Var2;
run;

 

NEWDATA was indeed created, but the Var3 column is blank "as a result of performing an operation on missing values." For some reason the new dataset also had a "total_cost" column after that, even though I did not ask for such a column. It, too, was blank. 

It seems, then, that "total_cost" (what's the term for it, if not "variable?") is no longer available by the time the next step starts. How do I pass it on to the next step? 

Tom
Super User Tom
Super User

@dvtarasov wrote:

Yes, obtaining the sum over a column (not grouped by anything within the column) and then using that sum as, essentially, a constant in the next step is what I'm trying to do. 

 

Before trying this with my actual data, I experimented on a small "test" dataset which had two columnts, Var1 and Var2. I tried to obtain the sum ("total_cost") of Var1 and then write a new dataset, which would be like the old one but have an additinal column, Var3, that would hold the values of Var2 multiplied by "total_cost". The summing was done as follows:

proc sql;
select sum(var1) as total_cost
from sasdata.olddata;

 

This part worked; the expected sum was computed. By the next step, however, "total_cost" seems to have gone missing and could not be used in the calculation: 

data sasdata.newdata;
set sasdata.olddata;
Var3 = total_cost * Var2;
run;

 

NEWDATA was indeed created, but the Var3 column is blank "as a result of performing an operation on missing values." For some reason the new dataset also had a "total_cost" column after that, even though I did not ask for such a column. It, too, was blank. 

It seems, then, that "total_cost" (what's the term for it, if not "variable?") is no longer available by the time the next step starts. How do I pass it on to the next step? 


So the first PROC SQL step just produced a report.  It did not generate anything that could be saved for later use because you did not ask it to do that.  It did not report any of the other variables in the old data either since you did not include them in the select statement.  You can make a new dataset with the total as new column and copy forward all of the data by using a statement like this.

proc sql noprint;
create table newdata as 
   select *,sum(var1) as total_cost
   from olddata
;
quit;

The reason your data step made an empty variable is because the only reference to the variable was on the right hand side fo the assignment statement. So SAS saw that you were referencing an previously undefined variable and created one for you. 

Here is a method that you can use to calculate and use a total in a single data step. But doing this means that you have to take control of when SAS reads and writes observations.

data newdata ;
   if _n_=1 then do while  (not eof);
      set olddata end=eof;
      total_cost = sum(total_cost,var1);
   end;
   retain total_cost;
   set olddata ;
   var3 = total_cost * var2;
run;
      

Basically the IF statement/ DO loop will make it read in the whole dataset when the data step first starts and sum var1 into the new varaible TOTAL_COST. The rest of the data step will process normally.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 17 replies
  • 8734 views
  • 0 likes
  • 4 in conversation