Hi All ,
I have following dummy dataset.
Number | Amount | Sequence |
1111 | 100 | A |
1111 | 200 | A |
1111 | 300 | C |
1111 | 100 | B |
1112 | 200 | A |
1112 | 100 | C |
1112 | 200 | A |
1113 | 300 | C |
1113 | 300 | A |
1114 | 100 | A |
1114 | 100 | B |
1114 | 200 | B |
1114 | 200 | C |
1115 | 300 | C |
1116 | 100 | A |
1116 | 200 | B |
1116 | 300 | C |
I want to delete all the rows with sequence=C but I want to keep its amount, so I am adding that amount to next observation of same group. In the cases where the row with sequence=C is last in the group then I want to add its amount to the preceding row. Something like below:
Number | Amount | Sequence |
1111 | 100 | A |
1111 | 200 | A |
1111 | 400 | B |
1112 | 200 | A |
1112 | 300 | A |
1113 | 600 | A |
1114 | 100 | A |
1114 | 100 | B |
1114 | 400 | B |
1116 | 100 | A |
1116 | 500 | B |
The following method could be used to accomplish what you're looking for:
data want (drop=_:); do until (last.Number); set have; by Number; if first.Number then _Ctotal=0; if Sequence eq 'C' then _Ctotal+Amount; end; do until (last.Number); set have (where=(Sequence ne 'C')); by Number; if last.Number then Amount+_Ctotal; output; end; run;
HTH,
Art, CEO, AnalystFinder.com
data have;
infile cards expandtabs truncover;
input Number Amount Sequence $;
n+1;
cards;
1111 100 A
1111 200 A
1111 300 C
1111 100 B
1112 200 A
1112 100 C
1112 200 A
1113 300 C
1113 300 A
1114 100 A
1114 100 B
1114 200 B
1114 200 C
1115 300 C
1116 100 A
1116 200 B
1116 300 C
;
run;
proc sort data=have;
by descending n;
run;
data temp;
set have;
by number notsorted;
if first.number or sequence ne 'C' then group+1;
run;
proc sort data=temp;
by n;
run;
data temp1;
sum=0;
do until(last.group);
set temp;
by group notsorted;
sum+amount;
end;
drop amount n group;
run;
data temp2;
set temp1;
by number notsorted;
if first.number or sequence ne 'C' then group+1;
run;
data want;
set temp2;
by group notsorted;
length seq $ 40;
retain seq;
if first.group then do;seq=sequence;amount=0;end;
amount+sum;
if last.group and seq ne 'C' then output;
drop sequence sum group;
run;
The following program can handle cases where sequence="C" can appear anywhere and be multiple :
data want;
length number amount 8 sequence $1;
carry = 0;
do until(last.number);
set have(rename=(amount=a sequence=s)); by number;
if s = "C" then carry = carry + a;
else do;
if not missing(sequence) then output;
amount = a + carry;
sequence = s;
carry = 0;
end;
end;
if not missing(sequence) then do;
amount = amount + carry;
output;
end;
keep number amount sequence;
run;
I think this program addresses number groups which have multiple C's scattered throughout.
Like Art's program it does a double "do until (last.number)". In my case, the first do group examines how the number group ends. If it ends with a non-C, then ending_ctotal=0. Otherwise ending_ctotal gets the total amount from the run of C records that finish the number group. And final_nonC is the relative position of the last non-C record. This will be used in the second do group, to identify the record which will absorb the ending_ctotal amount.
data want (keep=number amount sequence);
do N_Seq=1 by 1 until (last.number);
set have;
by number;
if sequence^='C' then do;
ending_ctotal=0;
final_nonC=N_Seq;
end;
else ending_ctotal=sum(ending_ctotal,amount);
end;
do N=1 to N_Seq;
set have;
if sequence='C' then prior_ctotal=sum(prior_ctotal,amount);
else do;
amount=sum(amount,prior_ctotal);
prior_ctotal=0;
if N=final_nonC then amount=sum(amount,ending_ctotal);
output;
end;
end;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.