Solved: Re: How to assign the value of last obs to other obs in the same group...

Ada77 · Posted 07-12-2019 03:34 PM

Suppose I have a dataset:

--Table 1--     
ID  Amount    
A   0.1            
A   0.2        
A   0.3         
B   0.4        
B   0.5

I used retain to create a new variable of total amount of each ID

:

data sample;
   set sample;
   by ID amout;
   if first.ID then Total_amount = 0;
   Total_amount = Total_amount +amount;
run;

And this will be the output dataset:

--Table 2--     
ID  Amount Total_amount     
A   0.1        0.1    
A   0.2        0.3
A   0.3        0.6
B   0.4        0.4
B   0.5        0.9

But now I want to assign the true total_amout to each obs. The output I want is:

--Table 2--     
ID  Amount Total_amount     
A   0.1        0.6    
A   0.2        0.6
A   0.3        0.6
B   0.4        0.9
B   0.5        0.9

Is there anyway to achieve it? Now I try to sort the data in descending order and than retain the total_amount.

data sample;
   set sample;
   by ID descending amout;
   if first.ID then Total_amount_1 = Total_amount;
run;

--Table 2--     
ID  Amount Total_amount Total_amount_1    
A   0.3        0.6         0.6
A   0.2        0.3         0.6
A   0.1        0.1         0.6
B   0.5        0.9         0.9
B   0.4        0.4         0.9

Although I can get what I want by this way, I think this method is kind of stupid. I hope there are some better methods.

ballardw · Posted 07-12-2019 03:42 PM

If you don't really want/ need the cumulative total here is one way.

data have;
input ID $  Amount    ;
datalines;
A   0.1            
A   0.2        
A   0.3         
B   0.4        
B   0.5 
;

proc sql;
   create table want as
   select a.*, b.total 
   from have as a
        left join
        (select id, sum(amount) as total
         from have
         group by id) as b
     on a.id = b.id
 ;
quit;

basic approach get the sum then merge back to the original data.

Lots of posts on the forum about this topic.

BTW, be careful of using the

Data setname;

set setname;

construct. This replaces your existing data set and a logic error could change values that you don't want and will require going back to recover the original values.

View solution in original post

ballardw · Posted 07-12-2019 03:42 PM

If you don't really want/ need the cumulative total here is one way.

data have;
input ID $  Amount    ;
datalines;
A   0.1            
A   0.2        
A   0.3         
B   0.4        
B   0.5 
;

proc sql;
   create table want as
   select a.*, b.total 
   from have as a
        left join
        (select id, sum(amount) as total
         from have
         group by id) as b
     on a.id = b.id
 ;
quit;

basic approach get the sum then merge back to the original data.

Lots of posts on the forum about this topic.

BTW, be careful of using the

Data setname;

set setname;

construct. This replaces your existing data set and a logic error could change values that you don't want and will require going back to recover the original values.

Ada77 · Posted 07-12-2019 04:36 PM

Thank you so much!

Kurt_Bremser · Posted 07-12-2019 03:42 PM

This "automatic remerging" can easily be achieved in SQL:

proc sql;
create table want as
select
  id,
  amount,
  sum(amount) as total_amount
from have
group by id;
quit;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Ada77 · Posted 07-12-2019 04:37 PM

Thank you so much, it really helps

novinosrin · Posted 07-12-2019 04:45 PM

Hi @Ada77 Try this





data have;
input ID $  Amount   ;
cards;
A   0.1 
A   0.2        
A   0.3         
B   0.4        
B   0.5 
;
 
data want;
if _n_=1 then do;
 dcl hash H (ordered: "d") ;
   h.definekey  ("id","Total_amount") ;
   h.definedata ("amount","Total_amount") ;
   h.definedone () ;
   dcl hiter hi('h');
  end;
do until(last.id);
set have;
by id;
Total_amount =sum(Total_amount ,amount);
rc=h.add();
end;
Total_amount1=Total_amount;
do while(hi.next()=0);
output;
end;
h.clear();
drop rc;
run;

@hashman Guru, I plagiarized your solution that you taught me yesterday. I admit it before you notice it. Rather, I consider that as I am a good student who pays attention in the affirmative. I am not apologizing unless you demand one. 🙂

hashman · Posted 07-12-2019 09:42 PM

@novinosrin : "Imitation is the sincerest form of flattery" ;).

Tom · Posted 07-13-2019 11:31 AM

The simplest is to use the automatic re-merge feature of PROC SQL.

create table want as select *,sum(amount) as total from sample group by id;

But if you really need to use a data step for some other reason then consider using a double-DOW loop. The first to calculate the total for the group and the second to output the detail rows.

data want;
do until (last.id);
  set sample;
  by id;
  total=sum(total,amount);
end;
do until (last.id);
  set sample;
  by id;
  output;
end;
run;

hashman · Posted 07-13-2019 05:14 PM

@Tom: Replying to your post, as you've included both the double DoW and SQL, but also to @Kurt_Bremser, @ballardw, and @novinosrin (and of course to the OP and/or anyone else watching the thread).

1. If the input data set is not sorted, it's more efficient to use the hash object to get the group totals and attach them back to the original records in the same step - since this approach avoids sorting altogether (keeping the records in the original order):

data want ;                             
  if _n_ = 1 then do ;                  
    dcl hash h () ;           
    h.definekey ("id") ;                
    h.definedata ("sum") ;              
    h.definedone () ;                   
    do until (z) ;                      
      set have (keep = id amt) end = z ;
      if h.find() ne 0 then sum = amt ; 
      else                  sum + amt ; 
      h.replace() ;                     
    end ;                               
  end ;                                 
  set have ;                            
  h.find() ;                            
run ;

2. If the input data set is sorted by ID, the double DoW is most efficient - provided that certain performance-promoting considerations (in bold italics below) are observed;

data want (drop = amt) ;           
  do _n_ = 1 by 1 until (last.id) ;
    set have (keep = id amt) ;     
    by id ;                        
    sum = sum (sum, amt) ;         
  end ;                            
  do _n_ = 1 to _n_ ;              
    set have ;    
   /* note: no BY statement here */                 
    output ;                       
  end ;                            
run ;

The KEEP option on the first SET is obvious - there's no need to read any variables but ID and AMT on the first pass. The counting instead of the second BY is less obvious since most folks tend to think that BY doesn't exact any performance penalty. Yet it does since it makes the DATA step do extra work comparing the values of the BY variables in the PDV with those in the buffer to set FIRST.x and LAST.x as many times as there are BY groups in the input times the number of BY variables.

Because the program above explicitly relies on the order by ID, it works the same way irrespective of whether the input data set contains the SORTEDBY=ID data set flag set by proc SORT or is merely ordered intrinsically and doesn't have the flag set. But not so with SQL! This is because if the optimizer doesn't see SORTEDBY=ID in the header of the input data set, it will choose the path requiring sorting (indicated by the SQXSRT access method if the system option MSGLEVEL=I and SQL statement option _METHOD are in effect). Which is why, as a precaution, it's always a good idea to tell SQL that input is sorted (if it is indeed ordered, of course): If the flag is already set by SORT, it won't hurt; but if it is not, the unnecessary sorting will be avoided:

proc sql _method stimer ;                                                           
  create table want as select *, sum (amt) as sum from have (sortedby=id) group id ;
quit ;

In this case, the underlying algorithm is essentially the same as with the double DoW above, though for whatever reason the latter still runs ~40% faster.

Kind regards

Paul D.

GraphGuy · Posted 07-15-2019 09:22 AM

Here's a slightly different version, how I would do it:

data sample;
input ID $1 Amount;
datalines;
A 0.1
A 0.2
A 0.3
B 0.4
B 0.5
;
run;

proc sql noprint;
create table sample as
select unique *, sum(amount) as total_amount
from sample
group by id;
quit; run;

How to assign the value of last obs to other obs in the same group?

Re: How to assign the value of last obs to other obs in the same group?

Re: How to assign the value of last obs to other obs in the same group?

Re: How to assign the value of last obs to other obs in the same group?

Re: How to assign the value of last obs to other obs in the same group?

Re: How to assign the value of last obs to other obs in the same group?

Re: How to assign the value of last obs to other obs in the same group?

Re: How to assign the value of last obs to other obs in the same group?

Re: How to assign the value of last obs to other obs in the same group?

Re: How to assign the value of last obs to other obs in the same group?

Re: How to assign the value of last obs to other obs in the same group?

Registration is open

SAS Training: Just a Click Away