I have data showing multiple payments from different accounts from 50 states. I want to group all the payments by account number in a given state, find out the percentiles (p25, p75, IQR) and calculate the outliers (p75 + 1.5 X Interquartile Range [IQR]), calculate the total payments for each state, and calculate the percentage of payments above the outlier limit (sum of payments above limit / sum of total payments). I have used a combination of DATA steps, PROC UNIVARIATE, PROC SQL, PROC COMPUTAB to get my desired results without success.
I have this:
State Payment Account#
NC $400 A123
NC $320 A123
OH $50 B234
OH $30 A145
OH $100 B234
OH $225 A145
.... ....... .........
And I want this:
State Total Payments p25 p75 Outlier_Limit % of Payments above Outlier_Limit
NC $620 $20 $300 $720.00 2%
OH $405 $15 $180 $427.50 4%
(Note: I didn't actually calculate percentiles, outliers, or payment % for the above example)
Can someone please help and lead me on the correct path?