Hi all,
I have a dataset and SQL below, this is the best i can do, I know it can be done in one step of Datastep or SQL. Need advice how to do it in one step(Datastep or SQL) instead of mine by 2 step.
data have;
infile cards dlm='|';
input ACC_NO SUB_ACCNO PRODUCT$ ACC_PLUS_PRODUCT :$20. CATEGORY $ AMT;
cards;
11111 | 12345 | GOLD | 11111GOLD | B | 50
11111 | 24895 | BLACK | 11111BLACK | B | 25
11111 | 24445 | BLACK | 11111BLACK | B | 75
11111 | 24445 | SILVER | 11111SILVER | B | 140
11111 | 23456 | SILVER | 11111SILVER | S | 70
11111 | 86475 | GOLD | 11111GOLD | S | 130
22222 | 34567 | SILVER | 22222SILVER | B | 100
22222 | 12458 | GOLD | 22222GOLD | B | 50
22222 | 12557 | GOLD | 22222GOLD | S | 100
33333 | 23785 | GOLD | 33333GOLD | B | 75
33333 | 12968 | SILVER | 33333SILVER | B | 110
33333 | 12968 | SILVER | 33333SILVER | S | 110
33333 | 54368 | SILVER | 33333SILVER | S | 50
33333 | 94256 | SILVER | 33333SILVER | S | 175
;
proc sql;
create table flow_list_remove_duplicate(drop=AMT) as
select *,
SUM(AMT) as count_AMT
from have
group by ACC_NO ;
quit;
proc sort data=work.flow_list_remove_duplicate;
by ACC_NO;
run;
DATA FINAL;
SET WORK.flow_list_remove_duplicate;
by ACC_NO;
if first.ACC_NO then output;
run;
Thank in advance
In the context of your query the other variables like subaccno don't make much sense, if you take a distinct acc_no, and sum amount then that is the data row, the other variables could be any of the records as they don't get summed. If you just want the last row in the group and a sum, then:
data want (drop=amt); set have; retain count_amt; by acc_no; if first.acc_no then count_amt=amt;
else count_amt=sum(count_amt,amt);
if last.acc_no;
run;
Expected result as below:-
Obs | ACC_NO | SUB_ACCNO | PRODUCT | ACC_PLUS_PRODUCT | CATEGORY | count_AMT |
1 | 11111 | 86475 | GOLD | 11111GOLD | S | 490 |
2 | 22222 | 34567 | SILVER | 22222SILVER | B | 250 |
3 | 33333 | 94256 | SILVER | 33333SILVER | S | 520 |
In the context of your query the other variables like subaccno don't make much sense, if you take a distinct acc_no, and sum amount then that is the data row, the other variables could be any of the records as they don't get summed. If you just want the last row in the group and a sum, then:
data want (drop=amt); set have; retain count_amt; by acc_no; if first.acc_no then count_amt=amt;
else count_amt=sum(count_amt,amt);
if last.acc_no;
run;
Hi RW9,
Thank for the code. Would appreciated if you could give guidance on the Proc SQL as well(if you're convenience).
In what sense sorry? For the summation, SQL looks at the group by columns, so only takes ACC_NO and AMT, as these are the only items which relate to the summary statistic. As for trying to get the last record out within SQL, this tends to be trickier. SQL does not have the concept of a series of observations, it only looks at logical blocks of observations. So you would need to have a logical way of finding the last record, maybe looking at your test data: where SUB_ACCNO=max(SUB_ACCNO). Anyways, for this it is far simpler to do in datastep.
Hi RW9,
You're absolute right, this is actually a test data(apologize for the poor test data). At the moment, there is no any criteria need to select which data after the summation done. I believe next will be asking criteria like last record, latest date(which i didn't put in), max/min or etc.
Thank again for your guidance, with sample code of this "where SUB_ACCNO=max(SUB_ACCNO)", at least I have idea how to do it in Datastep and SQL as well.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.