PROC ASTORE in SAS Viya

1 Like

Top o’ the marnin’ to ya. If you are using Viya or HP procedures, you have undoubtedly come across the ASTORE procedure. PROC ASTORE describes, manages, and scores new data with an analytic store.

Wait…What is an Analytic Store?

An analytic store is a binary file that contains the state from a predictive analytic procedure. This state from a predictive analytic procedure (such as a random forest) is created using the results from the training phase of model development. A key feature of an analytic store is that it can be easily transported from one host to another. This is in part because it is a compact and universal file form. The store names the one and only one SAS component that can restore the state of the computation/memory that can restore the post training memory and go on from there. It is also called a warm restart for scoring.

So for factmac, we have a shared library called “sofactmac“ that can restore/lift its factorization information from the store and go-into scoring mode (one record at a time) along with the astore infrastructure. For more details see Radhikha Myneni’s SAS High-Performance Analytics Tip: Scoring with Analytic Store.

Wait...What is a State?

State is a copy of all the memory items that are relevant to the next task, for example, scoring. So SAVESTATE essentially takes a snapshot of :

The "public" information (i.e., information common to all analytic engines). Examples of public information include the list of input variables, the list of output variables, the formats, and so on.
The "private" information (i.e., information specific to that particular analysis, e.g., random forest). Examples of private information include the number of trees, the trees themselves, the scores, etc.

Once you have created an analytic store, the beauty is that you can move it around to where you need it and use it at a later time for scoring new data. In fact, it can even be used to score models in a distributed environment/in-database with SAS Scoring Accelerator for Hadoop, Teradata or SAP HANA.

Wait…What is Scoring?

Remember how supervised learning works? Here’s a brief recap. Let’s say you’re a bartender at an Irish Pub in Annapolis, and you’re trying to determine whether you will run out of Guinness beer, Bailey’s Irish Cream, and Jameson whiskey on St. Patrick’s Day. You are well aware that if you do run out, you will have a riot on your hands with lots of green tchotchkes being thrown at you, so you will want to have an Uber getaway car waiting.

You have inputs like weather, day of the week that St. Patrick’s Day falls on, population totals, relative female:male demographics of Annapolis, how much liquor you have in stock, how many competitor Irish pubs are holding special events, etc. AND you have information on the outcome. That is, over the course of your business, you will have information on how much Irish liquor was sold each day. These are historic data where the outcomes (in this case, amount of alcohol consumed) is known.

You train your model with this historic data, and then you score the new data, i.e., put in the input variables for the upcoming St. Patrick’s Day (sunny, falls on a Friday, etc.), to determine if you will run out of Bailey’s. So you see, scoring new data is simply predicting the outcome with new data, using the model that we built using the historic data.

Well, what do you know, look what I found growing in my garden on St. Patrick’s Day in the snow among the daffodils:

A St. A Patrick's Day miracle!

Okay, back to analytics.

To accomplish scoring, we need to accumulate score code from the model training process. The analytic training process may include setting model hyperparameters, building the models, etc. Some models are simple to score, for example regression. On the other hand, some models are very complex to score, such as random forests, support vector machines, and factorization machines. This is when the analytic store comes to your rescue.

How I Imagine an Analytic Store

I like to think of an analytic store as a complicated recipe. I picture my input variables as the ingredients I have in my pantry: eggs, flour, cocoa, salt, butter, etc. I may spend years creating the perfect brownie recipe. My recipe tells me the exact amount of each item to include (1 cup of flour, ½ cup of butter, 0 cups of milk); these amounts remind me of my parameters, or βs in the simple linear regression equation below:

Some recipes are more difficult and require exact times and temperatures, and a series of sequential steps. If I am making fudge instead of brownies, I may need to use a double-boiler to get a low enough heat. I have to stir the ingredients over heat to dissolve the sugar but once the mixture reaches the soft ball stage, I dare not stir it or even shake the pan. If I do, that will make the sugar form large crystals, and will make my fudge grainy. All of this information needs to be in my recipe.

Note that my recipe might not include details on preparing the ingredients (sifting the flour, chilling the cream), which reminds me of preparing my data (imputing missing values, transforming the data). This is true for your analytic store as well, which does not necessarily have the information on the data preparation steps that happened outside of your analytic procedure, such as imputations or transformations.

Once I have created my perfect recipe, I can give the recipe to my neighbor, Mr. Rogers. Similarly, once I have created my analytic store, I can move it around, for example from the client to the server. Mr. Rogers has his own pantry with similar ingredients, so if he follows the recipe exactly (and if he also prepared his ingredients the same way), he should get similar results.

I can give my neighbor Mr. Rogers the recipe in just any old format.

I can give it to him in a compact, standard format.

So that it fits nicely into his recipe box.

In my standard format, I use measures like “tablespoon of butter” and “1/8 teaspoon of salt” rather than “lump of butter” or “pinch of salt” so that it is more universally understood.

An analytic store is also a compact, universal format.

Now That I Know I Want an Analytic Store, How Do I Get One?

To create an analytic store for a support vector machine in Viya, you will use a SAVESTATE statement inside the relevant procedure, as shown in the support vector machine procedure PROC SVMACHINE below.

Note that to create an analytic store in Viya, you will first need to be familiar with CAS and have a CAS session open, etc. If you are not familiar with CAS, look into a SAS course on Viya here.

For an example that you can actually run with the SAS sample HMEQ data, see the VDMML Documentation Procedures Guide.

What Viya Procedures Let You Create an Analytic Store

SAS Viya 3.2 analytic procedures that let you create an analytic store via the SAVESTATE statement are:

PROC FACTMAC (factorization machine procedure)
PROC FOREST (random forest procedure)
PROC GRADBOOST (gradient boosting procedure)
PROC SVMACHINE (support vector machine procedure)
PROC TEXTMINE (text mining procedure)
PROC SVDD (support vector data description)
PROC STFT (short-term fourier transfer)

What PROC ASTORE (Analytic Store) Does

PROC ASTORE scores an input data set and produces an output data set using the analytic store that you specify. It is an interactive procedure in which each statement runs immediately. PROC ASTORE produces:

Different types of scoring code that can run locally
Scoring code that can run SAS Viya

PROC ASTORE can also move analytic stores between the client and the server and can provide descriptive information about the analytic store. The syntax is shown below:

SCORE: Scores the model.
DESCRIBE: Specifies the name of the analytic store and produce basic scoring code.
DOWNLOAD: Retrieves from the CAS session the specified analytic store stores it in the local file system.
UPLOAD: Moves the specified analytic store from the local file system into a data table in CAS.

In Viya (as well as in high-performance analytics in SAS 9.4) the process of model building as well as the process of scoring new data can be multithreaded.

Tidbits

Note that Viya procs DO NOT produce any Java or Python score code – it will be either in DS1 or ASTORE (binary format for complex models). This is similar to the SAS 9 HP (high performance) procedures, which also did not support Java or Python score code.

The analytic store is not new to Viya. It was developed with high-performance procs and is available in SAS 9 Enterprise Miner 14.1 on and Factory Miner 14.1 on.

How Proc Astore is Pronounced

Is it ay-store, rhyming with vapor? Or is it uh-store, rhyming with bluster? Or is it as-tore rhyming with pastor? These are the mysteries of the universe that we may only discover answers to when we find the end of the rainbow.

Happy St. Patrick’s Day!

FOR MORE INFO

Radhikha Myneni’s SAS HPA Tip on Scoring with SAS Enterprise Miner
Radhikha Myneni’s SAS HPA Tip on Scoring with Analytic Store files:
VDMML 8.1 documentation procedures guide

PROC ASTORE in SAS Viya

Free course: Data Literacy Essentials

Get Started