05-22-2017 06:09 PM - edited 05-22-2017 06:57 PM
Where I work our Decision Science team is primarily a SAS shop, but implementing the models that the DS team creates is done by the Dev/Implementation teams in Java or Groovy primarily. My problem is this, the DS team (which I'm on btw) primarily only works with SAS and the Tech/Implementation team primarily only works in Java/Groovy. Up until now, when we've been implementing models (particularly the intermediate transformations we want done on the standard data to feed into the model), we've just sent the Dev team a sort of SAS like psuedo-code showing how we want the data transformed and they implement it in Groovy. This has generally worked, but occasionally it has really blown up in our faces, enough that we need to do something to improve the situation.
My initial thought was that the DS team should just learn enough Groovy to write the transforms ourselves, but this wasn't well received.
My next thought was to develop a more formalized version of the SAS psuedo-code we've been using that standardizes the format for the most common transforms we use to reduce ambiguity on both sides. I may still go this route since internally it seems to have the most support. (Any suggestions here would be welcome!)
While I was thinking about this and doing some research I came across PMML (Predictive Model Markup Language) and PFA (Portable Format for Analytics). Both teams already uses PMML in some parts of their workflow, but not for the transforms, and PFA seems like it may be an even better fit. Either would allow for a standardized, unambiguous way of specifying how we want the data transformed. The downsides is that it introduces a third language/format into the mix, and in the case of PFA hasn't been picked up very widely from what I can tell.
Does anyone have any experience/learnings/suggestions they'd like to share?
a week ago
It sounds like a difficult scenario to be sure. As an analyst, I have less of an appreciation for why there might be a requirement for using Java or Groovy. The challenge with Java or PMML are that these languages are far more limited than Base SAS. In addition, using transformations that create better models leads to even more work as you try and translate that into another language. It seems like there is a lot of manual effort being put in place to avoid using SAS for scoring. Would it be possible to score in SAS and just upload the scores to the database rather than having to re-architect the process in another less capable language?
a week ago
Thanks for the input Doug! Unfortunately scoring in SAS like you suggested isn't feasible in our case (at the very least not politically feasible anyways). Our main system is written in Java (with some groovy, scala and other stuff mixed in), and because it operates with very strict time constraints on real-time data the Dev team has been very reluctant to expand the current setup to allow sticking something like SAS in the loop. I've kind of get side-tracked off of this project for a while, but I'm still thinking the most feasible solution is to just have the DS team (or at least some subset of it) bite the bullet and learn enough groovy to get the job done. Just reviewing a fair amount of the existing code was enough to make me fairly fluent at understanding, it shouldn't be too much more difficult to start writing it.