- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I am trying to sum the cost to get the variable TOTAL (it only has ID and cost in the original table). Please see below. Thank you in advance!
ID COST TOTAL
1 10 100
2 40 100
3 50 100
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @di_niu0
A very simple approach to solve your issues is to use Proc SQL. The function count(cost) sums the cost and makes it available in all rows. Internally behind the scene it makes two passes as described as described by @mkeintz but a user need not be worried about it.
data have;
input id cost;
datalines;
1 10
2 40
3 50
;
run;
proc sql;
create table want as
select *, sum(cost) as total
from have;
quit;
There will be a message in the log like this indicating that the some was calculated and merged .
NOTE: The query requires remerging summary statistics back with the original data
The output will be what you wanted.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you want to produce this with a data step, you need it to pass through the data set twice: The first time to establish the total, and the second time to reread and output each obs with the total established in the first pass.
So here's the first two statements to do that:
data want;
set have (in=firstpass) have (in=secondpass);
...
...
run;
The SET statement above reads ALL the observations with the FIRSTPASS=1 condition (and SECONDPASS=0), and then rereads them with SECONDPASS=1 and FIRSTPASS=0. So you have dummy variables available to determine whether the observation-hand is from the first pass or the second pass.
So write code after the SET that creates a total during the first pass, and a filter that allows output only during the second pass.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @di_niu0
A very simple approach to solve your issues is to use Proc SQL. The function count(cost) sums the cost and makes it available in all rows. Internally behind the scene it makes two passes as described as described by @mkeintz but a user need not be worried about it.
data have;
input id cost;
datalines;
1 10
2 40
3 50
;
run;
proc sql;
create table want as
select *, sum(cost) as total
from have;
quit;
There will be a message in the log like this indicating that the some was calculated and merged .
NOTE: The query requires remerging summary statistics back with the original data
The output will be what you wanted.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Do you really have to sum the costs of different ids? This seems strange.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please use data step if possible.
For anyone else who doesn't have this unfortunate restriction, use the SQL solution from @Sajid01
For @di_niu0 who seems to require a data step, why??? In SAS, there are many ways to accomplish something, and sometimes you can have better solutions by removing this restriction.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content