BookmarkSubscribeRSS Feed
Calcite | Level 5

Is it possible to use a Random Forest on repeated measures? I have Medicare Part D claims data for the years 2013-2017 where I am interested in finding out predictors for a specific type of prescription. The dataset has every physician for those 5 years, with the variables measured each year. I want to cluster it by physician level (using NPI codes) and perhaps states.


If this isn't possible, I'll likely choose just choose my variables based on literature. The goal is use RF to choose my independent variables and then use a zero-inflated negative binomial regression or Poisson regression for the analysis/interpretation (likely ZINB since there's quite a bit zeroes for the dependent variable). 


Dependent variables: claim counts for a specific drug (count data).

Jade | Level 19

A possibility is to use RF on each year separately (assuming the responses are rolled up into one yearly count).  That should give you 5 sets of possible predictors.  From there, it depends on subject matter knowledge and the question you need to answer.  You could pick out the intersecting set of predictors (those that are selected for every year) or the union set (those that appear at least once).  However, it really depends on what the objective is - best association (which is what RF will give you) or possible relationship with individual predictors.  Not to be ignored are interaction effects, which likely are not accommodated in the RF dataset.


I know that isn't much of answer, but this is more art than tech, and it is going to depend a lot on what you do know and what you want to know.




Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.


Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 2 in conversation