Hello Team,
Greetings
I have one existing program which has statements like below
proc sql;
create table ABC.masterward as
select A, B, C, D, E, G
from db.Master
order by A, B;
quit;
data ABC.masterward;
set ABC.masterward(keep = A B C D E G);
end;
run;
data ABC.lastjoin;
merge ABC.ptp ABC.masterward(in = a where = (open eq 1)) ABC.smscalls ABC.agrpayment;
by A B;
if a;
run;
proc sql;
create table ABC.summary as
select A, B, sum(C) as OpenMatters, sum(D) as TotalBookValue, sum(E) as RPC
from ABC.lastjoin
group by A, G;
quit;;
proc sql;
create table ABC.summary_final as
A, B, OpenMatters, TotalBookValue, RPC, G
order by A, G;
quit;
Output of program gives us 45 rows as per expectation. Now client requested to add 4 more columns, but as soon we trying to add columns in MasterWard dataset, output changes and gives us 1000+ rows.
Yes, I agree it is because we have used SUM operation in "ABC.summary". But is there any way by which I can add required columns without affecting original output ?
Thanks In Advance
Your group by incorrect. It should be A, B instead of A,G
proc sql;
create table ABC.summary as
select A, B, sum(C) as OpenMatters, sum(D) as TotalBookValue, sum(E) as RPC
from ABC.lastjoin
group by A, G;
quit;;
Hi There,
Greetings
That is just program template. Please have a look into actual below program
proc sql;
create table ABC.masterward as
select b.szAccountNumber , b.wlevel , b.curOriginalCapital, b.dtfileclosed, b.dtFileFrozen, b.curSettlement, b.szMagicColumn6 as Ward_Description,
b.wclientcode, b.sztelhome, b.sztelwork, b.sztelother
from db.Master b
where wClientCode = -10023
order by Ward_Description,szAccountNumber;
quit;
data ABC.masterward;
set Emfuleni.masterward(keep = szAccountNumber wlevel curOriginalCapital curSettlement Ward_Description rpc ValueofRPC open Balance_Split has_valid_phone has_cell);
if has_valid_phone=1 or has_cell = 1 then do;
Telephone = 1;
ValueofTele = curOriginalCapital;
end;
else Telephone = 0;
run;
data ABC.lastjoin;
merge ABC.ptp Emfuleni.masterward(in = a where = (open eq 1)) ABC.smscalls ABC.agrpayment;
by Ward_Description szAccountNumber;
if a;
run;
proc sql;
create table ABC.summary as
select Ward_Description, Balance_Split, sum(open) as OpenMatters, sum(curOriginalCapital) as TotalBookValue, sum(RPC) as RPC, sum(ValueofRPC) as ValueofRPC,
sum(NumberofPTP) as NumberofPTPs, sum(NumberofPTP_Distinct) as NumberofPTP_Distinct, sum(curPTPAmount) as ValueofPTPs,
sum(Telephone) as HasValidTelephone, sum(ValueofTele) as ValueofTele, sum(outcalls) as outcalls,
sum(incalls) as incalls, sum(sms) as sms, sum(NumberofPayers) as NumberofPayers,
sum(NumberofPay) as NumberofPayments, sum(curPayment) as ValueofPayments,
mean(curPayment) as AveragePaymentDebtor
from ABC.lastjoin
group by Ward_Description, Balance_Split;
quit;;
proc sql;
create table ABC.summary_final as
select Ward_Description, Balance_Split, OpenMatters, TotalBookValue, RPC, ValueofRPC, NumberofPTPs, NumberofPTP_Distinct, ValueofPTPs, ValueofPTPs/NumberofPTPs as AveragePTP,
HasValidTelephone, ValueofTele, HasValidTelephone/OpenMatters as Pct_Valid_Numbers format = percent8.2, outcalls, incalls, sms, NumberofPayers, NumberofPayers/OpenMatters as Pct_Payers format = percent8.2,
NumberofPayments, ValueofPayments, NumberofPayers/NumberofPTP_Distinct as Pct_NrofPayers_NrofPTP_Distinct format = percent8.2,
NumberofPayments/NumberofPTPs as Pct_NrofPayments_NrofPTPs format = percent8.2 ,AveragePaymentDebtor
from Emfuleni.summary
order by Ward_Description, Balance_Split;
quit;
In this program I need to add 4 more columns, but as i add new columns into MasterWard output gets changed. Can you please provide your vaulable thought on this ?
Thanks In Advance
ABC.Masterward is created but never used. Recheck your program for this and possible other typos.
At which step do you actually experience the increase in observations?
If you add columns in a SQL SELECT that contains summary functions and a GROUP BY, all new columns must either be summarized or be in the GROUP BY.
Otherwise you will get all records of the FROM.
HI There,
Greetings
Thanks for feedback. I have changed my program now as below
proc sql;
create table ABC.summary as
select Ward_Description, Balance_Split, sum(open) as OpenMatters, sum(curOriginalCapital) as TotalBookValue, sum(RPC) as RPC, sum(ValueofRPC) as ValueofRPC,
sum(NumberofPTP) as NumberofPTPs, sum(NumberofPTP_Distinct) as NumberofPTP_Distinct, sum(curPTPAmount) as ValueofPTPs,
sum(Telephone) as HasValidTelephone, sum(ValueofTele) as ValueofTele, sum(outcalls) as outcalls,
sum(incalls) as incalls, sum(sms) as sms, sum(NumberofPayers) as NumberofPayers,
sum(NumberofPay) as NumberofPayments, sum(curPayment) as ValueofPayments,
mean(curPayment) as AveragePaymentDebtor
from ABC.lastjoin
group by Ward_Description, Balance_Split;
quit;;
proc sql;
create table ABC.summary_Modified as
select Ward_Description, Balance_Split, sum(open) as OpenMatters, sum(curOriginalCapital) as TotalBookValue, sum(RPC) as RPC, sum(ValueofRPC) as ValueofRPC,
sum(NumberofPTP) as NumberofPTPs, sum(NumberofPTP_Distinct) as NumberofPTP_Distinct, sum(curPTPAmount) as ValueofPTPs,
sum(Telephone) as HasValidTelephone, sum(ValueofTele) as ValueofTele, sum(outcalls) as outcalls,
sum(incalls) as incalls, sum(sms) as sms, sum(NumberofPayers) as NumberofPayers,
sum(NumberofPay) as NumberofPayments, sum(curPayment) as ValueofPayments,
mean(curPayment) as AveragePaymentDebtor, Account_Type, Exclusion_Reason
from ABC.lastjoin
group by Ward_Description, Balance_Split, Account_Type, Exclusion_Reason;
quit;;
proc sql;
create table ABC.summary_final as
select Ward_Description, Balance_Split, OpenMatters, TotalBookValue, RPC, ValueofRPC, NumberofPTPs, NumberofPTP_Distinct, ValueofPTPs, ValueofPTPs/NumberofPTPs as AveragePTP,
HasValidTelephone, ValueofTele, HasValidTelephone/OpenMatters as Pct_Valid_Numbers format = percent8.2, outcalls, incalls, sms, NumberofPayers, NumberofPayers/OpenMatters as Pct_Payers format = percent8.2,
NumberofPayments, ValueofPayments, NumberofPayers/NumberofPTP_Distinct as Pct_NrofPayers_NrofPTP_Distinct format = percent8.2,
NumberofPayments/NumberofPTPs as Pct_NrofPayments_NrofPTPs format = percent8.2 ,AveragePaymentDebtor
from ABC.summary
order by Ward_Description, Balance_Split;
quit;
proc sql;
create table ABC.summary_Final_Modified as
select Ward_Description, Balance_Split, OpenMatters, TotalBookValue, RPC, ValueofRPC, NumberofPTPs, NumberofPTP_Distinct, ValueofPTPs, ValueofPTPs/NumberofPTPs as AveragePTP,
HasValidTelephone, ValueofTele, HasValidTelephone/OpenMatters as Pct_Valid_Numbers format = percent8.2, outcalls, incalls, sms, NumberofPayers, NumberofPayers/OpenMatters as Pct_Payers format = percent8.2,
NumberofPayments, ValueofPayments, NumberofPayers/NumberofPTP_Distinct as Pct_NrofPayers_NrofPTP_Distinct format = percent8.2,
NumberofPayments/NumberofPTPs as Pct_NrofPayments_NrofPTPs format = percent8.2 ,AveragePaymentDebtor, Account_Type, Exclusion_Reason
from ABC.summary_Modified
order by Ward_Description, Balance_Split;
quit;
In above statements Summary_modfied has 2 extra columns Account_Type, Exclusion_Reason. By adding this output reduced but it is still not matching with original one.
Please assist us. Thanks In Advance
Well, this is quite natural. Adding more group variables results in finer granularity, causing more records to be written.
See this as a short example:
data have;
input var1 var2 var3 var4 var5;
cards;
1 1 1 1 2
1 1 1 2 2
1 1 2 1 3
1 1 2 2 3
1 2 1 1 1
2 1 1 1 2
2 2 1 1 2
;
run;
proc sql;
create table want1 as
select var1, var2, sum(var5) as var5
from have
group by var1, var2
;
create table want2 as
select var1, var2, var3, var4, sum(var5) as var5
from have
group by var1, var2, var3, var4
;
quit;
title "2 group vars";
proc print data=want1 noobs;
run;
title "4 group vars";
proc print data=want2 noobs;
run;
This results in the following outputs:
2 group vars var1 var2 var5 1 1 10 1 2 1 2 1 2 2 2 2 4 group vars var1 var2 var3 var4 var5 1 1 1 1 2 1 1 1 2 2 1 1 2 1 3 1 1 2 2 3 1 2 1 1 1 2 1 1 1 2 2 2 1 1 2
Since multiple values of var3 and var4 are present in the first two groups built with var1 and var2, those are split into extra lines.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.