This is the fourth and last article of a series about advanced aggregations and calculations. In the previous one we covered aggregated data sources and data source joins. The goal of this one is to utilize some of the concepts and capabilities explored before to calculate PMPM.
PMPM stands for "per member per month" and it is probably one of the most basic metrics used by actuaries in health insurance companies to assess the average cost of health care, as they consider how to price and set premiums. Even though its definition is simple, calculating PMPM when other dimensions are in the mix becomes a little bit challenging. Many of the solutions deal with the problem by pre-aggregating the data up to a certain level before the data is available for reporting. This certainly works, but assumptions made in the data prep phase turns out to be a limitation when reports are built. Here you will see a solution that allows PMPM to be provided on any dimensions available in the source data. The stepwise process used here goes beyond the PMPM example and can be applied to other industries.
At its most basic definition, PMPM can be calculated as:
(Claim Charges for Month M) / (Number of Active Members for Month M)
Clearly, we have two main tables to consider: the claims table and the membership table. The common key between them is the member ID.
Like said before, the main issue with the PMPM concept is when new dimensions are added, especially those dimensions that come from the claims table, such as provider, diagnosis, procedure, setting, etc. Simply joining those two tables would result in replicated member information if the person saw multiple providers in the same month, or had more than one procedure, etc., and those are not uncommon. Also, not every member has a claim in a certain month, and this could cause missing values to show up. For those that are not familiar with PMPM, the number of active members for a certain month is basically a distinct count of active members, but when two or more months are grouped together, this number is the sum of distinct counts for each month. In other words, it’s perfectly fine to aggregate by month. Having said that, the PMPM calculation main issue resides in the fact that the number of active members can be sliced, grouped, filtered, and aggregated by any permutation of dimensions that exist in the membership table, but it should not be affected by dimensions that are exclusive to the claims table. The claim charges, in another hand, can be sliced, grouped, filtered, and aggregated in any way you want, using any combination of dimensions, so there is nothing to worry about this portion of the expression.
To keep us focused on the main problem, we are going to simplify our input tables. First, instead of dealing with individual members and the extra work needed to identify whether they are active in a certain month, we will be using a membership table where this has already been done for us and we have the total number of active members already aggregated by month and a few other categories, such as gender, and age group. Also, no missing values exist in this table. Again, this is only to keep us focused on the core of the problem, but if those conditions above are not met, the solution still works. This is a snapshot of a few records of the membership table:
Similarly, we will be using a claims table that has been aggregated by month, gender, and age group (same dimensions found in the membership table), as well as dimensions that are exclusive to claims such as setting and provider. Again, no missing values exist to avoid noise as the solution is presented:
We all agree that in the attempt to solve this problem, we need to take the membership table and left join it with the claims table because most of the members don’t have claims every month, and even if using an aggregated membership table like ours, which raises the chance that there will be at least one claim for each group (given by the combination of all the dimensions we have), there still might be the case where a group didn’t have any claims. But just left joining the two tables by YearMonth, Gender, and AgeBreak is not enough, as you are going to see.
The next screenshot shows the window for defining a data source join in SAS Visual Analytics 8.3 (more on data source joins can be found in the previous article😞
If you visualize the joined table in a crosstab with a subset of the existing dimensions, where dimensions that are shared across the two input tables were added in the rows and the dimensions that are exclusive to the claims table were added in the columns, this is what you will get:
In the previous crosstab, you can clearly identify the following:
Digging deeper on issue #2 pointed out above, if you query the membership table for Gender=Female and AgeBreak in (110-119, 120+), you get the records below and if you aggregate by Gender and AgeBreak, which is what you have in the crosstab, you get the expected values for column Members for those crossings (small table on the right). Clearly, the Members column in the previous crosstab is all wrong.
So why didn’t you get the expected values? Again, focusing on Gender=Female and AgeBreak in (110-119, 120+) to help understand the issue, the detailed information extracted from the joined table used as the source for the crosstab looks like this:
Besides the missing values that were introduced due to the claims-specific dimensions, you get a few records that were replicated for months 201805 and 201806, because of different variations of settings and providers. As expected, the aggregation by Gender and AgeBreak (small table on the right) gives you exactly what you see in the in the Total column in the crosstab. Again, this is wrong.
The first thing you need to do, to avoid having missing values for dimensions that are exclusive to claims, is to add those dimensions into the membership table. This will be replicating records in the membership table, but don’t worry, a calculated item to be called MbrMonth (member month) will take care of it. At a high level, these are the steps:
Step #2 above is a cartesian join. To be able to perform a cartesian join, you will need to define a column in both input tables with all values the same to use it as the join key and then create aggregated data sources before joining the tables. This is explained in the Example 4 in the previous article.
For now, you can create a calculated column called For Cartesian Join and assign the constant value “anything” to it. You must define this column in both the claims and membership tables.
Then, from the claims table, you can create an aggregated data source called Claims-only Categories defined like this:
This is how the output table looks like:
Similarly, from the membership table, you can create an aggregated data source called Membership for cartersian join like this:
Finally, those two aggregated data source tables can be joined to create the table Membership with Claims-only Categories:
With the membership table modified into Membership with Claims-only Categories, you can left join this table with the claims table using all the categories as join keys to create the Joined Table - final:
Every single record in this final joined table belongs to a named category (no missing values). If you put the Members column in a new crosstab and compare it with the crosstab you had initially, you will be able to validate that:
Everything performed up to this point could be done in a data prep phase and the process flow from SAS Enterprise Guide found below does exactly that. SAS Visual Analytics was used just to show that you can still solve this problem even if the source tables haven’t been prepared yet.
But the solution doesn’t end here. Like said previously when the claims-only categories were joined with the membership table, records from the membership table were replicated. The table below is a partial view of the output of the join performed earlier, the table called Joined Table - final. All records highlighted in this table came from one single record in the original membership table (with YearMonth=201801, Gender=Female, and AgeBreak=000-009). They were replicated when the dimensions from the claims table (Setting and Provider) were added by the join:
If you have a visualization where all the categories exclusive from the claims table (Setting and Provider) have been added, all values for the Members column at the crossing of those dimensions will be correct, but like in the previous crosstab, where the category Provider is not present in the visualization, the values are inflated (aggregated), and we know that the number of members cannot be aggregated by categories from the claims table. This gets clearer if you look at the Total column, where the values are aggregating the number of members for each one of the settings, resulting in a number that is four times bigger than what it should.
The solution to prevent the inflated results is to create an aggregated measure called MbrMonth, defined like below:
The MbrMonth calculation uses the AggregateTable operator and it takes care of the replicated values. As you know, rows of the joined table that have the same Gender, AgeBreak, and YearMonth also have the same value in the Members column because they have been replicated. So, if you aggregate by those dimensions where the aggregation is average, or minimum, or maximum, the aggregated value will not be inflated. That’s exactly what happens: the arguments for the group by crossing are fixed and composed by all dimensions coming from the membership table (Gender, AgeBreak, and YearMonth), and the aggregation applied is average. The outer aggregation is sum, meaning that if any of the provided fixed dimensions are missing from the visualization, the averaged values should be summed. You can review the AggregateTable operator in a previous article of this series.
The table below shows how the previous value of 98,597 for Members at YearMonth=201801, Gender=Female, AgeBreak=000-009 is preserved in the MbrMonth column, while it’s inflated in the column Members, when Providers is removed from the visualization (the value 591,582 is 98,597 x 6 provider values that have been removed from the group by):
With MbrMonth calculation you can finally define PMPM as:
Finally, you can create a report to explore PMPM with hierarchies, filters, and any existing combination of dimensions. Notice that no missing values were introduced, and the number of active members is being correctly calculated:
That ends the series of articles about complex calculations in SAS Visual Analytics 8.3. The PMPM calculation utilized three out of four capabilities covered by the series: aggregated data sources, data source join, and AggregateTable operator.
Additional resources on advanced calculations and aggregations:
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.