I have three separate proc sqls. In the first one I am selecting (sum(Rx)) as Sum_of_Rx. In the second proc sql, I select (sum(Sum_of_Rx)) as Sum_of_Sum_of_Rx. And in the third proc sql, I select (sum(Sum_of_Sum_of_Rx)) as Sum_of_sum_of_sum_of_Rx. If the value of Rx is 1, what will each of these sum variables be? Obviously the Sum_of_Rx would be 1, but what about the rest? Note: I didn't write this code. I'm trying to understand why the person who wrote it would do this.
Love the naming conventions.
The first sum in the data step doesn't do anything, it only renames the variable and is pretty useless as shown.
The second sum creates an aggregate across the group by variables
the third creates another aggregate across a different set of group by variables, after joining with a different table.
for example, total by county for first SQL and then by state.
If if its set to 1 initially it could be used as a record counter to determine how many records are in each group by component.
It depends on the data, and possible the group by statement.
Please post some some sample data and the full code.
Sum(Rx) is first defined in this datastep (earlier on Rx is set to value of 1)
data EGTASK.Data_Put_In_Access;
set work.Calculated_Data2;
format AffiliationID $6. ApprovedDispensingFeeAmount DOLLAR16.2 ApprovedPriceType $10. CarrierID $9. ClaimOriginationFlag $1.
ClaimStatus $1. CostTypeCode $10. DISPENSEASWRITTENDAWPRODUCTSELE $1.
ExtendedSupplyNetworkIndicator $1. FilledMonth $7. FilledQuarter $6.
GPINumber $14. MailOrderIndicator $1. MultiSourceCode $1. PlanDrugStatus $1.
PriceScheduleDefinition $10. RxNetworkID $6. RxNetworkID_CHAIN $6.
SERVICEPROVIDERID $15. SuperPharmacyNetworkID $6. PHARMACYZIPCODE $10.
SUM_of_ApprovedIngredientCost DOLLAR16.2 SUM_of_ApprDispFeeAmount DOLLAR16.2 SUM_of_QUANTITYDISPENSED 13.3 ZIP $10.;
RxNetworkID_CHAIN = RxNetworkID;
SUM_of_ApprovedIngredientCost = sum(ApprovedIngredientCost);
SUM_of_ApprDispFeeAmount = sum(ApprovedDispensingFeeAmount);
SUM_of_AWP = sum(AWP);
SUM_of_RX = sum(RX);
SUM_of_QUANTITYDISPENSED = sum(QUANTITYDISPENSED);
ZIP = PHARMACYZIPCODE;
by AffiliationID ApprovedPriceType BRAND_GENERIC CarrierID ClaimOriginationFlag
ClaimStatus CLIENT CostTypeCode DISPENSEASWRITTENDAWPRODUCTSELE
ExtendedSupplyNetworkIndicator FilledMonth FilledQuarter FinalPlanCode GPINumber
MailOrderIndicator MultiSourceCode PlanDrugStatus PriceScheduleDefinition RxNetworkID
SERVICEPROVIDERID SuperPharmacyNetworkID PHARMACYZIPCODE Specialty_Flag EXCLUSIONS
IVLIndicator;
run;
And here is the first proc sql where sum of the sum takes place:
PROC SQL;
CREATE TABLE EGTASK.Data_Put_In_Access AS
SELECT t1.AccountID,
t1.AffiliationID,
t1.ApprovedIngredientCost,
t1.ApprovedPriceType,
t1.ApprovedDispensingFeeAmount,
t1.AverageWholesalePriceUnitCost,
t1.AWP,
t1.BRAND_GENERIC,
t1.CarrierID,
t1.ClaimOriginationFlag,
t1.ClaimStatus,
t1.CLIENT,
t1.COMPOUNDCODE,
t1.CostTypeCode,
t1.DISPENSEASWRITTENDAWPRODUCTSELE,
t1.EXCLUSIONS,
t1.ExtendedSupplyNetworkIndicator,
t1.FilledMonth,
t1.FilledQuarter,
t1.FinalPlanCode,
t1.GPINumber,
t1.GroupPlanCode,
t1.MailOrderIndicator,
t1.MemberID,
t1.MultiSourceCode,
t1.NETWORK,
t1.PlanDrugStatus,
t1.PriceScheduleDefinition,
t1.RxNetworkID,
t1.RXNetworkID_CHAIN,
t1.QUANTITYDISPENSED,
t1.SERVICEPROVIDERID,
t1.Specialty_Flag,
t1.SuperPharmacyNetworkID,
t1.ZIP,
t1.SUM_of_ApprovedIngredientCost,
t1.SUM_of_ApprDispFeeAmount,
/* SUM_of_SUM_of_ApprovedIngredientCost */
(SUM(t1.SUM_of_ApprovedIngredientCost)) AS SUM_of_SUM_of_AIC,
/* SUM_of_SUM_of_AWP */
(SUM(t1.SUM_of_AWP)) AS SUM_of_SUM_of_AWP,
/* SUM_of_SUM_of_RX */
(SUM(t1.SUM_of_RX)) AS SUM_of_SUM_of_RX,
/* SUM_of_SUM_of_QUANTITYDISPENSED */
(SUM(t1.SUM_of_QUANTITYDISPENSED)) FORMAT=13.3 AS SUM_of_SUM_of_QUANTITYDISPENSED,
/* SUM_of_SUM_of_ApprDispFeeAmount */
(SUM(t1.SUM_of_ApprDispFeeAmount)) AS SUM_of_SUM_of_ApprDispFeeAmount,
t1.AFF_NAME,
t1.AFF_TYPE,
t1.NETWORK,
t1.'Business Line'n,
t1.'Analytics Grouping 1'n,
FROM EGTASK.Data_Put_In_Access t1
GROUP BY t1.AffiliationID, t1.ApprovedPriceType, t1.BRAND_GENERIC, t1.CarrierID, t1.ClaimOriginationFlag,
t1.ClaimStatus, t1.CLIENT, t1.CostTypeCode, t1.DISPENSEASWRITTENDAWPRODUCTSELE,
t1.ExtendedSupplyNetworkIndicator, t1.FilledMonth, t1.FilledQuarter, t1.FinalPlanCode, t1.GPINumber,
t1.MailOrderIndicator, t1.MultiSourceCode, t1.PlanDrugStatus, t1.PriceScheduleDefinition, t1.RxNetworkID,
t1.RXNetworkID_CHAIN, t1.SERVICEPROVIDERID, t1.SuperPharmacyNetworkID, t1.ZIP, t1.Specialty_Flag,
t1.SUM_of_ApprovedIngredientCost, t1.AFF_NAME, t1.AFF_TYPE, t1.NETWORK, t1.'Business Line'n, t1.
'Analytics Grouping 1'n, (CALCULATED 'MAC Savings Grouping'n), t1.IVLIndicator;
QUIT;
And here is the second proc sql where the sum of sum of sum takes place:
PROC SQL;
CREATE TABLE EGTASK.All_Combined_Table AS
SELECT t1.ApprovedDispensingFeeAmount,
t1.ApprovedIngredientCost,
t1.ApprovedPriceType,
t1.AverageWholesalePriceUnitCost,
t1.AWP,
t1.BRAND_GENERIC,
t1.CarrierID,
t1.CLIENT,
t1.ExtendedSupplyNetworkIndicator,
t1.FilledMonth,
t1.FilledQuarter AS 'Filled Quarter'n,
t1.GPINumber,
t1.MailOrderIndicator,
t1.MemberID,
t1.MultiSourceCode,
t1.PriceScheduleDefinition,
t1.QUANTITYDISPENSED,
t1.RxNetworkID AS RXNetwork,
t1.RXNetworkID_CHAIN,
t1.SuperPharmacyNetworkID,
/* SUM_OF_ApprovedIngredientCost */
(SUM(t1.SUM_of_SUM_of_AIC)) FORMAT=DOLLAR16.2 AS SUM_OF_ApprovedIngredientCost,
/* SUM_OF_AWP */
(SUM(t1.SUM_of_SUM_of_AWP)) AS SUM_OF_SUM_OF_SUM_AWP,
/* SUM_OF_RX */
(SUM(t1.SUM_of_SUM_of_RX)) AS SUM_OF_SUM_OF_SUM_RX,
/* SUM_OF_QUANTITYDISPENSED */
(SUM(t1.SUM_of_SUM_of_QUANTITYDISPENSED)) FORMAT=13.3 AS SUM_OF_SUM_OF_SUM_QUANTITYDISPENSED,
/* SUM_OF_ADFA */
(SUM(t1.SUM_of_SUM_of_ApprDispFeeAmount)) FORMAT=DOLLAR16.2 AS SUM_OF_SUM_OF_SUM_ADFA,
t1.AFF_NAME AS 'AFF Name'n,
t1.AFF_TYPE AS 'AFF Type'n,
t1.NETWORK,
t1.Specialty_Flag,
t1.'Business Line'n AS Business_Line,
t1.'Analytics Grouping 1'n AS Analytics_Grouping1,
t1.'MAC Savings Grouping'n AS MACSavingsGrouping,
t1.IVLIndicator
FROM EGTASK.DATA_WITH_MAC_GROUPING t1 LEFT JOIN EGTASK.Tables_for_SAS4 t2 ON (t1.PriceScheduleDefinition = t2.
price_schedule)
WHERE t1.NETWORK NOT = 'PAPERMATCH'
GROUP BY t1.ApprovedPriceType, t1.CarrierID, t1.CLIENT, t1.FilledMonth, t1.FilledQuarter, t1.GPINumber,
t1.MailOrderIndicator, t1.MultiSourceCode, t1.PriceScheduleDefinition, t1.RxNetworkID,
t1.SuperPharmacyNetworkID, t1.AFF_NAME, t1.AFF_TYPE, t1.NETWORK, t1.'Business Line'n, t1.
'Analytics Grouping 1'n, t1.'MAC Savings Grouping'n, (CALCULATED 'MAC List'n), t1.IVLIndicator;
QUIT;
Love the naming conventions.
The first sum in the data step doesn't do anything, it only renames the variable and is pretty useless as shown.
The second sum creates an aggregate across the group by variables
the third creates another aggregate across a different set of group by variables, after joining with a different table.
for example, total by county for first SQL and then by state.
If if its set to 1 initially it could be used as a record counter to determine how many records are in each group by component.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.