SAS Visual Analytics Advanced Calculations (part 4 of 4): PMPM Example

3 Likes

This is the fourth and last article of a series about advanced aggregations and calculations. In the previous one we covered aggregated data sources and data source joins. The goal of this one is to utilize some of the concepts and capabilities explored before to calculate PMPM.

PMPM stands for "per member per month" and it is probably one of the most basic metrics used by actuaries in health insurance companies to assess the average cost of health care, as they consider how to price and set premiums. Even though its definition is simple, calculating PMPM when other dimensions are in the mix becomes a little bit challenging. Many of the solutions deal with the problem by pre-aggregating the data up to a certain level before the data is available for reporting. This certainly works, but assumptions made in the data prep phase turns out to be a limitation when reports are built. Here you will see a solution that allows PMPM to be provided on any dimensions available in the source data. The stepwise process used here goes beyond the PMPM example and can be applied to other industries.

PMPM calculation issues

At its most basic definition, PMPM can be calculated as:

(Claim Charges for Month M) / (Number of Active Members for Month M)

Clearly, we have two main tables to consider: the claims table and the membership table. The common key between them is the member ID.

Like said before, the main issue with the PMPM concept is when new dimensions are added, especially those dimensions that come from the claims table, such as provider, diagnosis, procedure, setting, etc. Simply joining those two tables would result in replicated member information if the person saw multiple providers in the same month, or had more than one procedure, etc., and those are not uncommon. Also, not every member has a claim in a certain month, and this could cause missing values to show up. For those that are not familiar with PMPM, the number of active members for a certain month is basically a distinct count of active members, but when two or more months are grouped together, this number is the sum of distinct counts for each month. In other words, it’s perfectly fine to aggregate by month. Having said that, the PMPM calculation main issue resides in the fact that the number of active members can be sliced, grouped, filtered, and aggregated by any permutation of dimensions that exist in the membership table, but it should not be affected by dimensions that are exclusive to the claims table. The claim charges, in another hand, can be sliced, grouped, filtered, and aggregated in any way you want, using any combination of dimensions, so there is nothing to worry about this portion of the expression.

To keep us focused on the main problem, we are going to simplify our input tables. First, instead of dealing with individual members and the extra work needed to identify whether they are active in a certain month, we will be using a membership table where this has already been done for us and we have the total number of active members already aggregated by month and a few other categories, such as gender, and age group. Also, no missing values exist in this table. Again, this is only to keep us focused on the core of the problem, but if those conditions above are not met, the solution still works. This is a snapshot of a few records of the membership table:

01- Membership table 01- Membership table

Similarly, we will be using a claims table that has been aggregated by month, gender, and age group (same dimensions found in the membership table), as well as dimensions that are exclusive to claims such as setting and provider. Again, no missing values exist to avoid noise as the solution is presented:

02- Claims table 02- Claims table

We all agree that in the attempt to solve this problem, we need to take the membership table and left join it with the claims table because most of the members don’t have claims every month, and even if using an aggregated membership table like ours, which raises the chance that there will be at least one claim for each group (given by the combination of all the dimensions we have), there still might be the case where a group didn’t have any claims. But just left joining the two tables by YearMonth, Gender, and AgeBreak is not enough, as you are going to see.

The next screenshot shows the window for defining a data source join in SAS Visual Analytics 8.3 (more on data source joins can be found in the previous article😞

03- First attempt - joining the two tables 03- First attempt - joining the two tables

If you visualize the joined table in a crosstab with a subset of the existing dimensions, where dimensions that are shared across the two input tables were added in the rows and the dimensions that are exclusive to the claims table were added in the columns, this is what you will get:

04- First attempt - crosstab with wrong results 04- First attempt - crosstab with wrong results

In the previous crosstab, you can clearly identify the following:

There is a column which Setting value is equal to missing. There was no missing value in any of the input tables, like explained before, but because not all the members had claims, and Setting is a dimension exclusive from claims, this is expected to happen, although it’s not desired.
The number of active members (column Members) should not be affected by the dimensions coming from the claims table. That means that if you follow a row, which contains dimensions and crossings of dimensions coming from the membership table, you should see the same value for Members. For example, if you compare the values for Members in the row identified by Gender=Female and AgeBreak=000-009 across all Members columns, all values should be the same, and they are in this case, as well as in many other rows, but if you look at the last two, AgeBreak rows for Females (110-119 and 120+), you see a problem there. The expected values for this crossing will be explained in a moment.
The row totals for members, displayed in the very last column, contains the sum of all Members values for each Setting value (in the columns), and this is wrong. The number of active members should not vary with any dimension that is exclusive to the claims table, like Settings and Provider. Under the perspective of the dimensions exclusive to the claims table, the number of members is not additive.
Without knowing the data very well you won’t be able to tell, but even the Members values that seem to be correct according to #2 above, are in fact wrong. This is because the dimension Provider that is exclusive to the claims table was not added to the crosstab, therefore the values we see in the crosstab were all aggregated (summed) by Setting, like what happened with the column that holds the row totals (#3 above).

Digging deeper on issue #2 pointed out above, if you query the membership table for Gender=Female and AgeBreak in (110-119, 120+), you get the records below and if you aggregate by Gender and AgeBreak, which is what you have in the crosstab, you get the expected values for column Members for those crossings (small table on the right). Clearly, the Members column in the previous crosstab is all wrong.

05- Membership table filtered and aggregated 05- Membership table filtered and aggregated

So why didn’t you get the expected values? Again, focusing on Gender=Female and AgeBreak in (110-119, 120+) to help understand the issue, the detailed information extracted from the joined table used as the source for the crosstab looks like this:

06- Joined table filtered and aggregated 06- Joined table filtered and aggregated

Besides the missing values that were introduced due to the claims-specific dimensions, you get a few records that were replicated for months 201805 and 201806, because of different variations of settings and providers. As expected, the aggregation by Gender and AgeBreak (small table on the right) gives you exactly what you see in the in the Total column in the crosstab. Again, this is wrong.

The solution

Getting the data ready

The first thing you need to do, to avoid having missing values for dimensions that are exclusive to claims, is to add those dimensions into the membership table. This will be replicating records in the membership table, but don’t worry, a calculated item to be called MbrMonth (member month) will take care of it. At a high level, these are the steps:

Create a table containing all the combined values of all claims-only dimensions (Setting and Provider in this example)
Merge this table with the membership table

Step #2 above is a cartesian join. To be able to perform a cartesian join, you will need to define a column in both input tables with all values the same to use it as the join key and then create aggregated data sources before joining the tables. This is explained in the Example 4 in the previous article.

For now, you can create a calculated column called For Cartesian Join and assign the constant value “anything” to it. You must define this column in both the claims and membership tables.

Then, from the claims table, you can create an aggregated data source called Claims-only Categories defined like this:

07- Claims-only aggregated data source 07- Claims-only aggregated data source

This is how the output table looks like:

08- Claims-only aggregated data source 08- Claims-only aggregated data source

Similarly, from the membership table, you can create an aggregated data source called Membership for cartersian join like this:

09- Memberships for cartesian join aggregated data source 09- Memberships for cartesian join aggregated data source

Finally, those two aggregated data source tables can be joined to create the table Membership with Claims-only Categories:

10- Membership with Claims-only categories data source join 10- Membership with Claims-only categories data source join

11- Membership with Claims-only categories data source join 11- Membership with Claims-only categories data source join

With the membership table modified into Membership with Claims-only Categories, you can left join this table with the claims table using all the categories as join keys to create the Joined Table - final:

12- Final joined table 12- Final joined table

13- Final joined table 13- Final joined table

Every single record in this final joined table belongs to a named category (no missing values). If you put the Members column in a new crosstab and compare it with the crosstab you had initially, you will be able to validate that:

14- Crosstab without missing but wrong member count 14- Crosstab without missing but wrong member count

Everything performed up to this point could be done in a data prep phase and the process flow from SAS Enterprise Guide found below does exactly that. SAS Visual Analytics was used just to show that you can still solve this problem even if the source tables haven’t been prepared yet.

15- Equivalent EG process flow 15- Equivalent EG process flow

But the solution doesn’t end here. Like said previously when the claims-only categories were joined with the membership table, records from the membership table were replicated. The table below is a partial view of the output of the join performed earlier, the table called Joined Table - final. All records highlighted in this table came from one single record in the original membership table (with YearMonth=201801, Gender=Female, and AgeBreak=000-009). They were replicated when the dimensions from the claims table (Setting and Provider) were added by the join:

16- Replicated records 16- Replicated records

If you have a visualization where all the categories exclusive from the claims table (Setting and Provider) have been added, all values for the Members column at the crossing of those dimensions will be correct, but like in the previous crosstab, where the category Provider is not present in the visualization, the values are inflated (aggregated), and we know that the number of members cannot be aggregated by categories from the claims table. This gets clearer if you look at the Total column, where the values are aggregating the number of members for each one of the settings, resulting in a number that is four times bigger than what it should.

Getting the calculation right

The solution to prevent the inflated results is to create an aggregated measure called MbrMonth, defined like below:

17- Member month calculation 17- Member month calculation

The MbrMonth calculation uses the AggregateTable operator and it takes care of the replicated values. As you know, rows of the joined table that have the same Gender, AgeBreak, and YearMonth also have the same value in the Members column because they have been replicated. So, if you aggregate by those dimensions where the aggregation is average, or minimum, or maximum, the aggregated value will not be inflated. That’s exactly what happens: the arguments for the group by crossing are fixed and composed by all dimensions coming from the membership table (Gender, AgeBreak, and YearMonth), and the aggregation applied is average. The outer aggregation is sum, meaning that if any of the provided fixed dimensions are missing from the visualization, the averaged values should be summed. You can review the AggregateTable operator in a previous article of this series.

The table below shows how the previous value of 98,597 for Members at YearMonth=201801, Gender=Female, AgeBreak=000-009 is preserved in the MbrMonth column, while it’s inflated in the column Members, when Providers is removed from the visualization (the value 591,582 is 98,597 x 6 provider values that have been removed from the group by):

18- Member month fixes Members inflation 18- Member month fixes Members inflation

With MbrMonth calculation you can finally define PMPM as:

19- PMPM 19- PMPM

Finally, you can create a report to explore PMPM with hierarchies, filters, and any existing combination of dimensions. Notice that no missing values were introduced, and the number of active members is being correctly calculated:

20- Final results 20- Final results

21- Final results 21- Final results

That ends the series of articles about complex calculations in SAS Visual Analytics 8.3. The PMPM calculation utilized three out of four capabilities covered by the series: aggregated data sources, data source join, and AggregateTable operator.

Additional resources on advanced calculations and aggregations:

Documentation: Aggregated tabular operators
Documentation: Working with Aggregated Data Sources
Documentation: Working with Data Source Joins in Reports
Community article: SAS Visual Analytics Advanced Calculations (part 1 of 4): AggregateCells by SAS' Renato Luppi
Community article: SAS Visual Analytics Advanced Calculations (part 2 of 4): AggregateTable by SAS' Renato Luppi
Community article: SAS Visual Analytics Advanced Calculations (part 3 of 4): Data Source Operations by SAS' Renato Luppi

moonlms_ · ‎07-02-2019

Very good article Renato, at least there is some way to calculate PMPM. However this is still in a sense of 'hard code' so really limit the flexibility, e.g. adding another dimension in the future need to reconstruct the source table again.

To be honest I think the good practice here is to treat claim and membership as two different table and then blend two SUM calculation on YEARMONTH, AGE, GENDER. It is a pity that data blending is still not available even on SAS Viya.

Renato_sas · ‎07-02-2019

Hi @moonlms_.

The clams and membership tables are kept separately. They are combined dynamically within VA through a series of transformations that use aggregated data source and data source join. It's like a multi-step calculation. We don't have any parameterized calculations in this example, but if we had, we would be able to verify that changes on the parameter value would be cascaded through the steps, showing that this is truly dynamic.

I agree with you that the member month calculation requires selecting upfront all categories from the membership table, but how often do we foresee new categories/dimensions being added to this table? it's not a rhetoric question 🙂

SAS Communities Library