Appericiate if someone of you guide me by creating the three target tables in one step.
proc sql; create table work.Paid_Exps as select distinct SEG, S_SEG, T_Busi, BRANCH, (sum(Qua_Prm)) as Paid_Expenses_Aggregate length = 8 format = 21.4 from Mapped_Portfl where Contract_alive='yes' group by SEG ; quit; proc sql; create table work.TMP_Agg_RR as select distinct SEG, S_SEG, T_Busi, BRANCH, (sum(Y_Prm)) as RC_Aggregate length = 8 format = 21.4 from Mapped_Portfl where Contract_alive='yes' and Cov_Start_RP='no' and Cov_End_RP='no' and Full_year_since_cov_start='yes' group by SEG ; quit; proc sql; create table work.TMP_Agg_RR_CP as select SEG, S_SEG, T_Busi, BRANCH, (sum(Y_Prm)) as AA_CO_Aggregate length = 8 format = 21.4 from Mapped_Portfl where Cov_Start_RP='yes' group by SEG ; quit;
How to do it really depends on the structure of your data. You calculate the sum of a couple of different variables, group by SEG. To which you add another couple of variables (S_SEG,T_Busi,BRANCH).
Two of your queries are done with a SELECT DISTINCT, one is just a SELECT. The pertinent questions are:
If the additional variables are all the same, and you just want the sum for each combination, I would go with a program something like what @Shmuel suggested:
proc sort data=Mapped_Portfl; by seg; run;
data Paid_Exps (keep= seg s_seg t_busi branch Paid_Expenses_Aggregate)
TMP_Agg_RR (keep= seg s_seg t_busi branch RC_Aggregate)
TMP_Agg_RR_CP(keep= seg s_seg t_busi branch AA_CO_Aggregate);
set Mapped_Portfl;
by seg;
if first.seg then
call missing(Paid_Expenses_Aggregate,RC_Aggregate,AA_CO_Aggregate );
if Contract_alive='yes' then do;
Paid_Expenses_Aggregate+Qua_Prm;
if Cov_Start_RP='no' and Cov_End_RP='no' and full_year_since_cov_start='yes' then
RC_Aggregate+Y_prm;
end;
if Cov_Start_RP='yes' then
AA_CO_Aggregate+Y_prm;
if last.seg;
if not missing(Paid_Expenses_Aggregate) then
output Paid_Exps;
if not missing(RC_Aggregate) then
output TMP_Agg_RR;
if not missing(AA_CO_Aggregate) then
output TMP_Agg_RR_CP;
run;
I changed the program a bit (apart from the obvious, such as inserting a SET statement): instead of looking at the criteria at the end of each group (which will probably go wrong, as the whole idea probably is that the criteria variables differ) and instead I looked at the generated sums to see if they had been added to (by setting them to missing initially instead of zero).
This program will generate one record for each SEG.
If you want something that copies the behavior of your own program, as written, I would go by a modification of the solution suggested by @mkeintz:
proc sort data=mapped_portfl out=mysort;
by seg S_SEG T_Busi BRANCH;
run;
data work.paid_exps (keep=seg s_seg t_busi branch paid_expenses_aggregate)
work.TMP_Agg_RR (keep=seg s_seg t_busi branch rc_aggregate )
work.TMP_Agg_RR_CP (keep=SEG S_SEG T_Busi BRANCH AA_CO_Aggregate) ;
do until (last.seg);
set mysort;
by seg;
if contract_alive='yes' then paid_expenses_aggregate=sum(paid_expenses_aggregate,qua_prm);
if contract_alive='yes' and Cov_Start_RP='no' and Cov_End_RP='no' and Full_year_since_cov_start='yes' then
rc_aggregate=sum(rc_aggregate,y_prm);
if cov_start_rp='yes' then aa_co_aggregate=sum(aa_co_aggregate,Y_Prm);
end;
do until (last.seg);
set mysort;
by seg S_SEG T_Busi BRANCH;
if last.branch then do; /* the selects with DISTINCT */
if not missing(paid_expenses_aggregate) then
output paid_exps;
if not missing(rc_aggregate) then
output TMP_Agg_RR;
end;
/* the select without DISTINCT */
if cov_start_rp='yes' then
output tmp_agg_rr_cp;
end;
run;
If you sort the data by SEG, you could do this with a single DATA step:
proc sort data=mapped_portfl out=mysort;
by seg;
run;
data work.paid_exps (keep=seg s_seg t_busi branch paid_expenses_aggregate)
work.TMP_Agg_RR (keep=seg s_seg t_busi branch rc_aggregate )
work.TMP_Agg_RR_CP (keep=SEG S_SEG T_Busi BRANCH AA_CO_Aggregate) ;
do until (last.seg);
set mysort;
by seg;
if contract_alive='yes' then paid_expenses_aggregate=sum(paid_expenses_aggregate,qua_prm);
if contract_alive='yes' and Cov_Start_RP='no' and Cov_End_RP='no' and Full_year_since_cov_start='yes' then
rc_aggregate=sum(rc_aggregate,y_prm);
if cov_start_rp='yes' then aa_co_aggregate=sum(aa_co_aggregate,Y_Prm);
end;
do until (last.seg);
set mysort;
by seg;
if contract_alive='yes' then output paid_exps;
if contract_alive='yes' and Cov_Start_RP='no' and Cov_End_RP='no' and Full_year_since_cov_start='yes' then
output TMP_Agg_RR;
if cov_start_rp='yes' then output tmp_agg_rr_cp;
end;
run;
Try next code:
proc sort data=Mapped_Portfl; by seg; run;
data Paid_Exps (keep= seg s_seg t_busi brabch Paid_Expenses_Aggregate)
TMP_Agg_RR (keep= seg s_seg t_busi brabch RC_Aggregate)
TMP_Agg_RR_CP(keep= seg s_seg t_busi brabch RC_Aggregate
rename=(RC_Aggregate=AA_CO_Aggregate));
by seg;
retain RC_Aggregate Paid_Expenses_Aggregate;
format RC_Aggregate Paid_Expenses_Aggregate 21.4;
if first.seg then do;
RC_Aggregate=0;
Paid_Expenses_Aggregate =0;
end;
RC_Aggregate = sum(of V, Y_Prm);
Paid_Expenses_Aggregate = (sum(Qua_Prm));
if last.seg then do;
if contract_alive='yes' then output paid_exps;
if contract_alive='yes' and Cov_Start_RP='no' and
Cov_End_RP='no' and Full_year_since_cov_start='yes'
then output TMP_Agg_RR;
if cov_start_rp='yes' then output tmp_agg_rr_cp;
run;
run;
How to do it really depends on the structure of your data. You calculate the sum of a couple of different variables, group by SEG. To which you add another couple of variables (S_SEG,T_Busi,BRANCH).
Two of your queries are done with a SELECT DISTINCT, one is just a SELECT. The pertinent questions are:
If the additional variables are all the same, and you just want the sum for each combination, I would go with a program something like what @Shmuel suggested:
proc sort data=Mapped_Portfl; by seg; run;
data Paid_Exps (keep= seg s_seg t_busi branch Paid_Expenses_Aggregate)
TMP_Agg_RR (keep= seg s_seg t_busi branch RC_Aggregate)
TMP_Agg_RR_CP(keep= seg s_seg t_busi branch AA_CO_Aggregate);
set Mapped_Portfl;
by seg;
if first.seg then
call missing(Paid_Expenses_Aggregate,RC_Aggregate,AA_CO_Aggregate );
if Contract_alive='yes' then do;
Paid_Expenses_Aggregate+Qua_Prm;
if Cov_Start_RP='no' and Cov_End_RP='no' and full_year_since_cov_start='yes' then
RC_Aggregate+Y_prm;
end;
if Cov_Start_RP='yes' then
AA_CO_Aggregate+Y_prm;
if last.seg;
if not missing(Paid_Expenses_Aggregate) then
output Paid_Exps;
if not missing(RC_Aggregate) then
output TMP_Agg_RR;
if not missing(AA_CO_Aggregate) then
output TMP_Agg_RR_CP;
run;
I changed the program a bit (apart from the obvious, such as inserting a SET statement): instead of looking at the criteria at the end of each group (which will probably go wrong, as the whole idea probably is that the criteria variables differ) and instead I looked at the generated sums to see if they had been added to (by setting them to missing initially instead of zero).
This program will generate one record for each SEG.
If you want something that copies the behavior of your own program, as written, I would go by a modification of the solution suggested by @mkeintz:
proc sort data=mapped_portfl out=mysort;
by seg S_SEG T_Busi BRANCH;
run;
data work.paid_exps (keep=seg s_seg t_busi branch paid_expenses_aggregate)
work.TMP_Agg_RR (keep=seg s_seg t_busi branch rc_aggregate )
work.TMP_Agg_RR_CP (keep=SEG S_SEG T_Busi BRANCH AA_CO_Aggregate) ;
do until (last.seg);
set mysort;
by seg;
if contract_alive='yes' then paid_expenses_aggregate=sum(paid_expenses_aggregate,qua_prm);
if contract_alive='yes' and Cov_Start_RP='no' and Cov_End_RP='no' and Full_year_since_cov_start='yes' then
rc_aggregate=sum(rc_aggregate,y_prm);
if cov_start_rp='yes' then aa_co_aggregate=sum(aa_co_aggregate,Y_Prm);
end;
do until (last.seg);
set mysort;
by seg S_SEG T_Busi BRANCH;
if last.branch then do; /* the selects with DISTINCT */
if not missing(paid_expenses_aggregate) then
output paid_exps;
if not missing(rc_aggregate) then
output TMP_Agg_RR;
end;
/* the select without DISTINCT */
if cov_start_rp='yes' then
output tmp_agg_rr_cp;
end;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.