Where I work our Decision Science team is primarily a SAS shop, but implementing the models that the DS team creates is done by the Dev/Implementation teams in Java or Groovy primarily. My problem is this, the DS team (which I'm on btw) primarily only works with SAS and the Tech/Implementation team primarily only works in Java/Groovy. Up until now, when we've been implementing models (particularly the intermediate transformations we want done on the standard data to feed into the model), we've just sent the Dev team a sort of SAS like psuedo-code showing how we want the data transformed and they implement it in Groovy. This has generally worked, but occasionally it has really blown up in our faces, enough that we need to do something to improve the situation. My initial thought was that the DS team should just learn enough Groovy to write the transforms ourselves, but this wasn't well received. My next thought was to develop a more formalized version of the SAS psuedo-code we've been using that standardizes the format for the most common transforms we use to reduce ambiguity on both sides. I may still go this route since internally it seems to have the most support. (Any suggestions here would be welcome!) While I was thinking about this and doing some research I came across PMML (Predictive Model Markup Language) and PFA (Portable Format for Analytics). Both teams already uses PMML in some parts of their workflow, but not for the transforms, and PFA seems like it may be an even better fit. Either would allow for a standardized, unambiguous way of specifying how we want the data transformed. The downsides is that it introduces a third language/format into the mix, and in the case of PFA hasn't been picked up very widely from what I can tell. Does anyone have any experience/learnings/suggestions they'd like to share?
... View more