BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
David_Billa
Rhodochrosite | Level 12

Appericiate if someone of you guide me by creating the three target tables in one step.

 

proc sql;
    create table work.Paid_Exps as
        select distinct
            SEG,
            S_SEG,
            T_Busi,
            BRANCH,
            (sum(Qua_Prm)) as Paid_Expenses_Aggregate length = 8
            format = 21.4
        from Mapped_Portfl
         where Contract_alive='yes'
         group by SEG                    
    ;
quit;

proc sql;
    create table work.TMP_Agg_RR as
        select distinct
            SEG,
            S_SEG,
            T_Busi,
            BRANCH,
            (sum(Y_Prm)) as RC_Aggregate length = 8
            format = 21.4
        from Mapped_Portfl
        where Contract_alive='yes' and Cov_Start_RP='no' and Cov_End_RP='no' and Full_year_since_cov_start='yes'
        group by SEG    
    ;
quit;

proc sql;
    create table work.TMP_Agg_RR_CP as
        select
            SEG,
            S_SEG,
            T_Busi,
            BRANCH,
            (sum(Y_Prm)) as AA_CO_Aggregate length = 8
            format = 21.4
        from Mapped_Portfl
        where Cov_Start_RP='yes'
        group by SEG                    
    ;
quit;
1 ACCEPTED SOLUTION

Accepted Solutions
s_lassen
Meteorite | Level 14

How to do it really depends on the structure of your data. You calculate the sum of a couple of different variables, group by SEG. To which you add another couple of variables (S_SEG,T_Busi,BRANCH). 

 

Two of your queries are done with a SELECT DISTINCT, one is just a SELECT. The pertinent questions are:

  • Are your additional variables (S_SEG,T_Busi,BRANCH) unique by SEG, or do they differ?
  • If they differ, are there duplicates of each combination? And if there are, do you really want all the records out, or just one of each kind (in which case you should use SELECT DISTINCT for the last query as well.

If the additional variables are all the same, and you just want the sum for each combination, I would go with a program something like what @Shmuel suggested:

 

proc sort data=Mapped_Portfl; by seg; run;

data Paid_Exps  (keep= seg s_seg t_busi branch Paid_Expenses_Aggregate)
     TMP_Agg_RR (keep= seg s_seg t_busi branch RC_Aggregate)
	 TMP_Agg_RR_CP(keep= seg s_seg t_busi branch AA_CO_Aggregate);
  set Mapped_Portfl;
  by seg;
  if first.seg then
    call missing(Paid_Expenses_Aggregate,RC_Aggregate,AA_CO_Aggregate );
  if Contract_alive='yes' then do;
    Paid_Expenses_Aggregate+Qua_Prm;
    if Cov_Start_RP='no' and Cov_End_RP='no' and full_year_since_cov_start='yes' then
      RC_Aggregate+Y_prm;
    end;
  if Cov_Start_RP='yes' then
    AA_CO_Aggregate+Y_prm;
  if last.seg; 
  if not missing(Paid_Expenses_Aggregate) then 
    output Paid_Exps;
  if not missing(RC_Aggregate) then 
    output TMP_Agg_RR;
  if not missing(AA_CO_Aggregate) then
    output TMP_Agg_RR_CP;
 run;

I changed the program a bit (apart from the obvious, such as inserting a SET statement): instead of looking at the criteria at the end of each group (which will probably go wrong, as the whole idea probably is that the criteria variables differ) and instead I looked at the generated sums to see if they had been added to (by setting them to missing initially instead of zero).

This program will generate one record for each SEG.

 

If you want something that copies the behavior of your own program, as written, I would go by a modification of the solution suggested by @mkeintz:

proc sort data=mapped_portfl out=mysort;
  by seg S_SEG T_Busi BRANCH;
run;

data work.paid_exps     (keep=seg s_seg t_busi branch paid_expenses_aggregate)
     work.TMP_Agg_RR    (keep=seg s_seg t_busi branch rc_aggregate )
     work.TMP_Agg_RR_CP (keep=SEG S_SEG T_Busi BRANCH AA_CO_Aggregate)   ;

  do until (last.seg);
    set mysort;
    by seg;
    if contract_alive='yes' then paid_expenses_aggregate=sum(paid_expenses_aggregate,qua_prm);
    if contract_alive='yes' and Cov_Start_RP='no' and Cov_End_RP='no' and Full_year_since_cov_start='yes' then
      rc_aggregate=sum(rc_aggregate,y_prm);
    if cov_start_rp='yes' then aa_co_aggregate=sum(aa_co_aggregate,Y_Prm);
  end;

  do until (last.seg);
    set mysort;
    by seg S_SEG T_Busi BRANCH;
    if last.branch then do; /* the selects with DISTINCT */
      if not missing(paid_expenses_aggregate) then 
        output paid_exps;
      if not missing(rc_aggregate) then
        output TMP_Agg_RR;
      end;
    /* the select without DISTINCT */
    if cov_start_rp='yes' then 
      output tmp_agg_rr_cp;
   end;
run;

 

View solution in original post

4 REPLIES 4
mkeintz
PROC Star

If you sort the data by SEG, you could do this with a single DATA step:

 


proc sort data=mapped_portfl out=mysort;
  by seg;
run;

data work.paid_exps     (keep=seg s_seg t_busi branch paid_expenses_aggregate)
     work.TMP_Agg_RR    (keep=seg s_seg t_busi branch rc_aggregate )
     work.TMP_Agg_RR_CP (keep=SEG S_SEG T_Busi BRANCH AA_CO_Aggregate)   ;

  do until (last.seg);
    set mysort;
    by seg;
    if contract_alive='yes' then paid_expenses_aggregate=sum(paid_expenses_aggregate,qua_prm);
    if contract_alive='yes' and Cov_Start_RP='no' and Cov_End_RP='no' and Full_year_since_cov_start='yes' then
      rc_aggregate=sum(rc_aggregate,y_prm);
    if cov_start_rp='yes' then aa_co_aggregate=sum(aa_co_aggregate,Y_Prm);
  end;

  do until (last.seg);
    set mysort;
    by seg;
    if contract_alive='yes' then output paid_exps;
    if contract_alive='yes' and Cov_Start_RP='no' and Cov_End_RP='no' and Full_year_since_cov_start='yes' then
      output TMP_Agg_RR;
    if cov_start_rp='yes' then output tmp_agg_rr_cp;
  end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
David_Billa
Rhodochrosite | Level 12
Thank you. Any other ways?
Shmuel
Garnet | Level 18

Try next code:

proc sort data=Mapped_Portfl; by seg; run;
data Paid_Exps  (keep= seg s_seg t_busi brabch Paid_Expenses_Aggregate)
     TMP_Agg_RR (keep= seg s_seg t_busi brabch RC_Aggregate)
	 TMP_Agg_RR_CP(keep= seg s_seg t_busi brabch RC_Aggregate
	               rename=(RC_Aggregate=AA_CO_Aggregate));
  by seg;
     retain RC_Aggregate Paid_Expenses_Aggregate;
	 format RC_Aggregate Paid_Expenses_Aggregate  21.4;
	 
	 if first.seg then do;
        RC_Aggregate=0;
		Paid_Expenses_Aggregate =0;
	 end;
	 RC_Aggregate = sum(of V, Y_Prm);
	 Paid_Expenses_Aggregate = (sum(Qua_Prm));
	 
	 if last.seg then do;
	    if contract_alive='yes' then output paid_exps;
        if contract_alive='yes' and Cov_Start_RP='no' and 
		   Cov_End_RP='no' and Full_year_since_cov_start='yes' 
		   then output TMP_Agg_RR;
        if cov_start_rp='yes' then output tmp_agg_rr_cp;	    
	 run;
	 
run;

   

s_lassen
Meteorite | Level 14

How to do it really depends on the structure of your data. You calculate the sum of a couple of different variables, group by SEG. To which you add another couple of variables (S_SEG,T_Busi,BRANCH). 

 

Two of your queries are done with a SELECT DISTINCT, one is just a SELECT. The pertinent questions are:

  • Are your additional variables (S_SEG,T_Busi,BRANCH) unique by SEG, or do they differ?
  • If they differ, are there duplicates of each combination? And if there are, do you really want all the records out, or just one of each kind (in which case you should use SELECT DISTINCT for the last query as well.

If the additional variables are all the same, and you just want the sum for each combination, I would go with a program something like what @Shmuel suggested:

 

proc sort data=Mapped_Portfl; by seg; run;

data Paid_Exps  (keep= seg s_seg t_busi branch Paid_Expenses_Aggregate)
     TMP_Agg_RR (keep= seg s_seg t_busi branch RC_Aggregate)
	 TMP_Agg_RR_CP(keep= seg s_seg t_busi branch AA_CO_Aggregate);
  set Mapped_Portfl;
  by seg;
  if first.seg then
    call missing(Paid_Expenses_Aggregate,RC_Aggregate,AA_CO_Aggregate );
  if Contract_alive='yes' then do;
    Paid_Expenses_Aggregate+Qua_Prm;
    if Cov_Start_RP='no' and Cov_End_RP='no' and full_year_since_cov_start='yes' then
      RC_Aggregate+Y_prm;
    end;
  if Cov_Start_RP='yes' then
    AA_CO_Aggregate+Y_prm;
  if last.seg; 
  if not missing(Paid_Expenses_Aggregate) then 
    output Paid_Exps;
  if not missing(RC_Aggregate) then 
    output TMP_Agg_RR;
  if not missing(AA_CO_Aggregate) then
    output TMP_Agg_RR_CP;
 run;

I changed the program a bit (apart from the obvious, such as inserting a SET statement): instead of looking at the criteria at the end of each group (which will probably go wrong, as the whole idea probably is that the criteria variables differ) and instead I looked at the generated sums to see if they had been added to (by setting them to missing initially instead of zero).

This program will generate one record for each SEG.

 

If you want something that copies the behavior of your own program, as written, I would go by a modification of the solution suggested by @mkeintz:

proc sort data=mapped_portfl out=mysort;
  by seg S_SEG T_Busi BRANCH;
run;

data work.paid_exps     (keep=seg s_seg t_busi branch paid_expenses_aggregate)
     work.TMP_Agg_RR    (keep=seg s_seg t_busi branch rc_aggregate )
     work.TMP_Agg_RR_CP (keep=SEG S_SEG T_Busi BRANCH AA_CO_Aggregate)   ;

  do until (last.seg);
    set mysort;
    by seg;
    if contract_alive='yes' then paid_expenses_aggregate=sum(paid_expenses_aggregate,qua_prm);
    if contract_alive='yes' and Cov_Start_RP='no' and Cov_End_RP='no' and Full_year_since_cov_start='yes' then
      rc_aggregate=sum(rc_aggregate,y_prm);
    if cov_start_rp='yes' then aa_co_aggregate=sum(aa_co_aggregate,Y_Prm);
  end;

  do until (last.seg);
    set mysort;
    by seg S_SEG T_Busi BRANCH;
    if last.branch then do; /* the selects with DISTINCT */
      if not missing(paid_expenses_aggregate) then 
        output paid_exps;
      if not missing(rc_aggregate) then
        output TMP_Agg_RR;
      end;
    /* the select without DISTINCT */
    if cov_start_rp='yes' then 
      output tmp_agg_rr_cp;
   end;
run;

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1457 views
  • 2 likes
  • 4 in conversation