I can't post sample data because these datasets are extremely large. This is the problem I'm running into. When I run this query:
PROC SQL;
CREATE TABLE EGTASK.DATA_WITH_MAC_GROUPING2 AS
SELECT t1.AffiliationID,
t1.ApprovedPriceType,
t1.BRAND_GENERIC,
t1.CarrierID,
t1.ClaimOriginationFlag,
t1.ClaimStatus,
t1.CLIENT,
t1.CostTypeCode,
t1.DISPENSEASWRITTENDAWPRODUCTSELE,
t1.ExtendedSupplyNetworkIndicator,
t1.FilledMonth,
t1.FilledQuarter,
t1.FinalPlanCode,
t1.GPINumber,
t1.MailOrderIndicator,
t1.MemberID,
t1.MultiSourceCode,
t1.PlanDrugStatus,
t1.PriceScheduleDefinition,
t1.RxNetworkID,
t1.RXNetworkID_CHAIN,
t1.SERVICEPROVIDERID,
t1.Specialty_Flag,
t1.SuperPharmacyNetworkID,
t1.ZIP,
t1.SUM_of_ApprovedIngredientCost,
t1.SUM_of_SUM_of_AIC,
t1.SUM_of_SUM_of_AWP,
t1.SUM_of_SUM_of_RX,
t1.SUM_of_SUM_of_QUANTITYDISPENSED,
t1.AFF_NAME,
t1.AFF_TYPE,
t1.NETWORK,
t1.'Business Line'n,
t1.'Analytics Grouping 1'n,
t1.'MAC Savings Grouping'n,
t1.IVLIndicator,
t1.Fund_Type,
/* GER Pharm */
(CASE
WHEN t1.RXNetworkID IN('00008','00177','00608','00782','00039','00123','00207') THEN 'CVS'
WHEN t1.RXNetworkID IN('00A13','00226') THEN 'Walgreens'
WHEN t1.RXNetworkID IN('00599','00886','00519') THEN 'Omnicare'
WHEN t1.RXNetworkID IN('00631','00752','00754','00686') THEN 'Pharmerica'
WHEN t1.RXNetworkID = '00905' AND t1.SuperPharmacyNetworkID IN('MDLTC','MDLTCS') THEN 'MHA'
ELSE 'Other'
END) AS 'GER Pharm'n,
t2.ApprovedPriceType AS Instate,
t3.ApprovedPriceType AS Code_MAC
FROM EGTASK.DATA_WITH_MAC_GROUPING t1
LEFT JOIN EGTASK.'TABLES FOR SAS'n t2 ON (t1.SuperPharmacyNetworkID = t2.SuperPharmacyNetworkID)
LEFT JOIN EGTASK.'TABLES FOR SAS_0000'n t3 ON (t1.FinalPlanCode = t3.FinalPlanCode);
QUIT;
I'm getting about 2 million observations. If I add DISTINCT to the SELECT clause I still get 2 million observations. However when I run this one (which directly follows the previous one):
PROC SQL;
CREATE TABLE EGTASK.All_Combined_Table AS
SELECT DISTINCT t1.ApprovedPriceType,
t1.CarrierID,
t1.CLIENT,
t1.FilledMonth,
t1.FilledQuarter AS 'Filled Quarter'n,
t1.GPINumber,
t1.MailOrderIndicator,
count(distinct(t1.MemberID)) as Lives,
t1.MultiSourceCode,
t1.PriceScheduleDefinition,
t1.RxNetworkID AS RXNetwork,
t1.SuperPharmacyNetworkID,
/* SUM_OF_ApprovedIngredientCost */
(SUM(t1.SUM_of_SUM_of_AIC)) FORMAT=DOLLAR16.2 AS SUM_OF_ApprovedIngredientCost,
/* SUM_OF_AWP */
(SUM(t1.SUM_of_SUM_of_AWP)) AS SUM_OF_AWP,
/* SUM_OF_RX */
(SUM(t1.SUM_of_SUM_of_RX)) AS SUM_OF_RX,
/* SUM_OF_QUANTITYDISPENSED */
(SUM(t1.SUM_of_SUM_of_QUANTITYDISPENSED)) FORMAT=13.3 AS SUM_OF_QUANTITYDISPENSED,
t1.AFF_NAME AS 'AFF Name'n,
t1.AFF_TYPE AS 'AFF Type'n,
t1.NETWORK,
t1.'Business Line'n AS Business_Line,
t1.'Analytics Grouping 1'n AS Analytics_Grouping1,
t1.'MAC Savings Grouping'n AS MACSavingsGrouping,
t1.IVLIndicator,
t1.Fund_Type,
t1.'GER Pharm'n,
t1.Instate,
t1.Code_MAC,
t1.MAC_List
FROM EGTASK.DATA_WITH_MAC_GROUPING2 t1
WHERE t1.NETWORK NOT = 'PAPERMATCH'
GROUP BY t1.ApprovedPriceType, t1.CarrierID, t1.CLIENT, t1.FilledMonth, t1.FilledQuarter, t1.GPINumber,
t1.MailOrderIndicator, t1.MultiSourceCode, t1.PriceScheduleDefinition, t1.RxNetworkID,
t1.SuperPharmacyNetworkID, t1.AFF_NAME, t1.AFF_TYPE, t1.NETWORK, t1.'Business Line'n, t1.
'Analytics Grouping 1'n, t1.'MAC Savings Grouping'n, t1.MAC_List, t1.IVLIndicator, t1.Fund_Type;
QUIT;
I also get 2 million records. However if I add the DISCINT to the SELECT clause on this one (The ALL_COMBINED_TABLE), I get about 800,000 observations. I don't understand why this happens.
On second SQL program you have a GROUP statement.
With DISTINCT you got one obervation per group only.
Does it explain and give an answer to your question ?
On second SQL program you have a GROUP statement.
With DISTINCT you got one obervation per group only.
Does it explain and give an answer to your question ?
Then with the DISTINCT on the second one, would I be getting rid of records I'd actually want to keep do you think?
Sorry, I don't understand what you mean by "getting rid of records I'd actually want to keep" ?
Can you give a schematic example of one group input (detailed) and output expected ?
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.