About PatrickHall

PatrickHall · ‎10-31-2016

I think what you propose with random forest is a good start, but it assumes you have labeled data for past promotions or customer behavior. If you do, then you can use the predicited probabilities for each target level to rank the offers for each customer exactly as you propose.

PatrickHall · ‎10-28-2016

Link Analysis node is a good option. Other options include: - Using the Association node and/or Market Basket node to generate frequent item sets and next best offers. (Similar to Link Analysis approach.) - If you have Text Miner, you can use PROC SPSVD or PROC HPTMINE to generate SVD features directly from transactional/COO data, and find clusters of similar users or items using the Cluster node. You can also use procedures like DISCRIM and DISTANCE to perform other common collaborative filtering operations using these SVD features. - Using the Random Forest node, Neural Net node or other multinomial classifiers to predict the next item a user will purchase based on sequences of past purchases or the attributes of past purchases.

PatrickHall · ‎06-21-2016

No - not a hard number at all, but a bigger problem will take longer and at some point you may run out of resources during training if the training set is too big. To give you some idea - I was able to roughly replicate the paper you referenced using a 300-100-2-100-300 autoencoder built with proc neural, in about 6 hrs. using 12 cores on a server with 128 GB of RAM. Less more/cores + less/more memory = less/more time. You may find this example code helpful: https://github.com/sassoftware/enlighten-deep And the exact code I used is in this paper: https://support.sas.com/resources/papers/proceedings14/SAS313-2014.pdf I suggest using tech=CONGRA for the optimization. Hope that helps ...

PatrickHall · ‎05-19-2016

Making machine learning more interpretable Machine learning capabilities have been available for years (even decades), and they are becoming much more mainstream now. However, one nagging problem with applying machine learning algorithms in regulated industries is the difficulties associated with interpreting how machine learning models make their decisions. I believe this is a fundamental problem that won't be solved outright anytime soon, but I've gathered some tips on how to make machine learning more interpretable from working with SAS customers all over the world. Take a look: https://www.oreilly.com/ideas/predictive-modeling-striking-a-balance-between-accuracy-and-interpretability. Why does interpretability even matter? My colleague @andrew_pease123 answers that question here: https://www.oreilly.com/ideas/why-interpretability-matters-in-data-analytics Want to know more about machine learning? Check out this GitHub repo with technical best practices resources including quick reference tables and a thorough best practices guide for applied machine learning: https://github.com/sassoftware/enlighten-apply/tree/master/ML_tables. To learn more about machine learning from a business perspective see this SAS and O'Reilly co-sponsored report: http://www.sas.com/en_us/whitepapers/evolution-of-analytics-108240.html

PatrickHall · ‎03-21-2016

Hi, In reference to the comparison with R: One is not better than the other - though people attempt to compare them all the time - they are simply very different technologies. SAS is a full-stack system of proprietary software products meant to help organizations access, manage, and analyze data and to deploy the results of the analysis into operational, enterprise computer systems. R is a very popular and useful open source langauge, geared primarily toward manipulating and analyzing data and presenting results. 1.) SAS offers numerous data management packages across data integration, data quality, database and Hadoop integration, data governance and more: http://www.sas.com/en_us/software/data-management.html 2.) I am not a data management expert but I think you are asking for straightfoward functionality that would be available in Base SAS (a SAS language-based data manipulation and analysis package) and SAS Access to Oracle. Base SAS: http://www.sas.com/en_us/software/base-sas.html (You can try out Base SAS in the free SAS University Edition: http://www.sas.com/en_us/software/university-edition.html) SAS Access to Oracle: https://support.sas.com/documentation/cdl/en/acreldb/68028/HTML/default/viewer.htm#titlepage.htm If you are thinking of using SAS on a single laptop or workstation (as opposed to an enterprise install that could entail multiple servers, clients, databases and grids or clusters of machines), the traditional advantages of SAS are: Highly optimized data access to and from Oracle Ability to execute SQL code in the Oracle database from your SAS session (SAS PROC SQL: http://support.sas.com/documentation/cdl/en/sqlproc/69049/PDF/default/sqlproc.pdf) Disk-enabled memory management: SAS holds data on disk until it is needed in-memory, allowing you to work with much larger data sets than RAM alone would allow Combining database-like data managment tools with analysis tools in the same client. I think the main drawback is if you find you need another package, you can't just download it. You, your company, or your University typically has to purchase the additional package. HTH.

PatrickHall · ‎02-19-2016

I liked your suggestion, so I tried changing the input coding. Under Model Options -> Input Coding -> GLM. With deviation coding the values are not the same: Analysis of Maximum Likelihood Estimates Parameter DF Estimate Error Chi-Square Pr > ChiSq Estimate Exp(Est) M_DemAge 0 1 0.0741 0.0344 4.65 0.0311 1.077 Odds Ratio Estimates Point Effect Estimate M_DemAge 0 vs 1 1.160 With GLM coding, the values are the same. Thanks!! Analysis of Maximum Likelihood Estimates Parameter DF Estimate Error Chi-Square Pr > ChiSq Estimate Exp(Est) M_DemAge 0 1 0.1389 0.0794 3.06 0.0802 1.149 Odds Ratio Estimates Point Effect Estimate M_DemAge 0 vs 1 1.149

PatrickHall · ‎02-19-2016

What is the interpretation of the highlighted value in the image below? I understand that in the odds ratio table (not pictured), the displayed value for this level of the categorical variable will be different because it will be compared to a reference level. But what does it mean here exactly - when not compared to a reference level?

PatrickHall · ‎01-19-2016

If you are an instructor you should have free access to many in-depth educational materials provided by SAS' education practice: http://support.sas.com/learn/ap/prof/index.html (This includes text mining materials.)

PatrickHall · ‎10-29-2015

14700 is too many inputs for PROC NEURAL. Either use less features, say < 500 for PROC NEURAL or use HPNEURAL with 1 or 2 layers. HTH, p

PatrickHall · ‎10-21-2015

TL; DR: Test PROC NEURAL with many layers against PROC HPNEURAL with two layers to see which does best. PROC NEURAL doc is here: http://support.sas.com/documentation/onlinedoc/miner/em43/neural.pdf. PROC HPNEURAL doc is available under the "secure documentation" link here: http://support.sas.com/software/products/miner/index.html#s1=3 (password available from tech. support) Code examples here: https://github.com/sassoftware/enlighten-deep https://github.com/sassoftware/enlighten-apply/tree/master/SAS_Neural_PatternRecognition Details: Is your data encoded video like an mpeg? If so you will need to use something besides SAS to decode your video into pixel intensity values. I suggest OpenCV. Once your data is in a standard tabular format containing numerical columns (probably with pixels as columns and frames as rows), then you can read it into SAS easily using PROC IMPORT or a DATA step. Also, remember to standardize before training a neural network. If you are training a neural network with more than two layers, I would suggest using the FREEZE and THAW statements in PROC NEURAL to conduct layer-wise pretraining, and then training all the layers together again. In current releases, HPNEURAL does not provide protection against vanishing or exploding gradients for deep networks - two layers should be fine with HPNEURAL. I would suggest testing a large network two layer network (many hidden units per layer) trained with HPNEURAL against a deeper network trained with PROC NEURAL. I would expect HPNEURAL to be faster than PROC NEURAL, even using PROC NEURAL's multithreading capabilities. The syntax for PROC HPNEURAL is straightfoward, something like: proc hpneural data=frames; input pixel:; hidden 1000; /* first layer */ hidden 500; /* second layer */ target label / level=nom; train numtries=1 maxiter=5000; /* nthreads=number of cores you want to use */ /* if you have SAS HPA then you can use the nodes= */ /* option to use more than 1 machine - vroom, vroom! */ performance nthreads=12 details; score out=frames_score; run; Now for PROC NEURAL ... which is more complicated. PROC NEURAL allows for layerwise pretraining and can you help you avoid one of the most common pratfalls in training deep neural networks: vanishing/exploding gradients. What are vanishing/exploding gradients? Prior to deep learning neural networks were typically initialized using random numbers. Neural networks generally use the gradient of the network's parameters w.r.t. to the network's error to adjust the parameters to better values in each training iteration. In back propagation, to evaluate this gradient involves the chain rule and you must multiply each layer's parameters and gradients together across all the layers. This is a lot of multiplication, especially for networks with more than 2 layers. If most of the weights across many layers are less than 1 and they are multiplied many times then eventually the gradient just vanishes into a machine-zero and training stops. If most of the parameters across many layers are greater than 1 and they are multiplied many times then eventually the gradient explodes into a huge number and the training process becomes intractable. PROC NEURAL provides a mechanism to avoid vanishing/exploding gradients in deep networks, by training only one layer of the network at a time. Once all the layers have been initialized through this pre-training process to values that are more suitable for the data, you can usually train the deep network using gradient descent techniques without the problem of vanishing/exploding gradients. It looks like this, roughly: proc neural data=frames /* you can assign validation or test data with validdata= or testdata= */ dmdbcat=work.cat_frames /* create required catalog with PROC DMDB */ random= 12345; /* take advantage of multithreading */ /* may also need to be allowed on SAS invokation or in SASv9.cfg */ performance compile details cpucount=12 threads= yes; /* L2 regularization */ netoptions decay= 0.1; /* define network architecture */ archi MLP hidden= 3; hidden 100 / id=h1; hidden 50 / id=h2; hidden 10 / id=h3; /* Fill in <n> - I noticed : notation sometimes does not work here */ input pixel1-pixel<n> / id=i level=int; target label / id=t level=nom; /* tuning parameter that reduces the possibility that any neuron becomes */ /* saturated during initialization */ /* saturation discussion here: http://ow.ly/TGzuF */ *initial infan=0.5; /* conduct pretraining to find better initilization, time-consuming, */ /* sometimes problematic for deep nets */ *prelim 10 preiter=10; /* pre-train input layer by freezing all other hidden layers */ /* (I never freeze the target layer, but you can try that too) */ freeze h1->h2; freeze h2->h3; train maxtime=10000 maxiter=5000; /* pre-train first hidden layer by freezing input layer, */ /* and thawing first hidden layer */ freeze i->h1; thaw h1->h2; train maxtime=10000 maxiter=5000; /* pre-train second hidden layer by freezing first hidden layer, */ /* and thawing second hidden layer */ freeze h1->h2; thaw h2->h3; train maxtime=10000 maxiter=5000; /* now that all hidden and input layers have been pre-trained, */ /* train all layers together by thawing all frozen layers */ thaw i->h1; thaw h1->h2; /* you can try the robust backprop optimization technique to help control for */ /* vanishing/exploding gradients when training all layers */ train maxtime=10000 maxiter=5000 /* tech=rprop */; score data=frames outfit=frames_fit out=frames_score /* you can score validation and test data as well */ role=train; run; Please be aware that recent advances in deep learning are hot topics at SAS R&D too and we are hoping to provide much more functionality for deep learning in coming releases ... but - as always - no promises. Enterprise grade scientific software takes time.

PatrickHall · ‎06-12-2015

Hi, This is a very good question. To echo the comments of many above, the basic difference between PROC DISCRIM in SAS/STAT and the MBR node in SAS Enterprise Miner is that the MBR node can use the RDTREE method to search for nearest neighbor observations with a known class in training data to classify new observations in test data. (The RDTREE method is a proprietary version of the popular KDTREE algorithm.) The tree-based neighbor search can be faster, but you may get different results between PROC DISCRIM and the MBR node. If you want to have the closest correspondence between PROC DISCRIM and the MBR node, use to the SCAN option in the MBR node's METHOD property to request that the MBR node use a conventional distance calculation to classify new observations. However, the traditional distance calculation may be unsuited for big data. Here is an example of how different your results could be in a low-dimensional simulated data set. (Your results could be even more different with real data.) Notice that the classifications made by PROC DISCRIM and PROC PMBR (i.e. the MBR node) using the SCAN method are almost identical. Using the RDTREE method with a small EPSILON, you can also closely replicate the results of PROC DISCRIM on this sample data. But, if you change the value for EPSILON then your results can be noticeably different from PROC DISCRIM using its default settings. Changing the BUCKETS property does not appear to change the classification results, but changing the EPSILON property does change the classification results. So what is going on here? The BUCKETS property: The value of the BUCKETS option should not affect classification results. It is a parameter that is used to balance the speed vs. memory trade-off for the tree structure. It is basically the number of observations allowed to be in each node of the RDTREE. Nodes of the tree are searched in O(log N) time; within each node the search for neighbors is slower. However building more nodes requires more memory. A lower value for BUCKETS will result in a faster calculation, but more memory being used. A higher value for BUCKETS will result in a slower calculation, but less memory being used. The EPSILON property: EPSILON controls the approximate nearest neighbor search; changing EPSILON can affect classification results. A larger value for EPSILON will allow more points that may not be actual nearest neighbors to be used to classify a new observation. A smaller value for EPSILON will use more points that are guaranteed to be nearest neighbors to classify a new observation. Using a larger value for EPSILON should decrease execution time. If you would like to try this experiment for yourself, the code is available here: PROC DISCRIM vs. the MBR node in Enterprise Miner Message was edited by: Patrick Hall; updated code and added details for BUCKETS and EPSILON.

PatrickHall · ‎05-11-2015

We've also released more examples of using SAS, R, and PMML here: sassoftware/enlighten-integration · GitHub

PatrickHall · ‎05-11-2015

Great to hear it works . You've made an astute observation. Let's consider this approach an alternate approach, not necessarily a better approach. Here are a few advantages in my mind: This approach allows you to get output and error info passed to the SAS log. Some system access commands, i.e. the x statement, cannot be used in EM under certain settings. This tip: https://communities.sas.com/docs/DOC-10832 shows how to use this code inside Enterprise Miner to compare SAS models and Python models. This code is open-source. You can change it to do more complex things ... perhaps you would like the Java bridge between SAS and some third-party software to have a more complex behavior than just kicking off a process.

PatrickHall · ‎05-06-2015

Ray, That is a great suggestion and a well-founded, scalable, and contemporary method for addressing missing values in a predictive model. The idea is that a decision tree will use patterns detected from *all* the variables - which may not be obvious to us, e.g. 2-way correlations - to predict the missing value for each observation. Several other best practices for handling missing values include: 1. Simply leaving the missing values in the data and using a decision tree or an ensemble of decision trees (i.e. random forest and/or gradient boosting) as your final predictive model. Decision trees handle missing values at least 2 different ways: --- In training they can group missing values in bins by themselves or along with other values of a variable, and use missing values to build the predictive model. --- Surrogate rules: decision trees can use a variable like "State" to make a decision about a variable like "ZipCode" if it encounters a missing value for "ZipCode". 2. Impute the missing values however you like but retain a binary missing value indicator variable, so that missingness can be used to help make your final predictions. Hope that helps.

PatrickHall · ‎04-26-2015

I'll be presenting on my approach to the Cloudera Data Science Challenge 2 at SAS Global Forum. For anyone who can't make it and who is interested in technical resources pertaining to SAS and Data Science, you can access the paper and code I will be presenting here: Paper: http://support.sas.com/resources/papers/proceedings15/SAS2520-2015.pdf Code: http://support.sas.com/resources/papers/proceedings15/SAS2520-2015.zip

Online Status	Offline
Date Last Visited	‎10-31-2016 02:32 PM

Re: What Modelling technique to use in order to attribute the right of...

Re: What Modelling technique to use in order to attribute the right of...

Re: Deep learning in SAS Enterprise miner

Recently Published Machine Learning Resources

Re: Sourcing, manipulating working with data in SAS

Re: Interpretation of exponentiated coefficient for categorical variab...

Interpretation of exponentiated coefficient for categorical variable i...

Re: SAS Enterprise Miner SVM

Re: Deep learning in SAS Enterprise miner

Re: Deep learning in SAS Enterprise miner

Re: What Modelling technique to use in order to attribute the right of...

Re: Recently Published Machine Learning Resources

SAS High-Performance Analytics tip #2: HPDM nodes in SAS Enterprise Mi...

SAS High-Performance Analytics tip #3: Example flow diagram in SAS Ent...

SAS High-Performance Analytics tip #1: How it differs from SAS Grid & ...

Re: What Modelling technique to use in order to attribute the right of...

Recently Published Machine Learning Resources

Re: Sourcing, manipulating working with data in SAS

Re: Interpretation of exponentiated coefficient for categorical variab...

Re: Deep learning in SAS Enterprise miner

Tip: Open Source Integration Using the Base SAS Java Object

The Open Source Integration node installation cheat sheet

Tip: How to execute a Python script in SAS® Enterprise Miner™

How to build a deep learning model in SAS Enterprise Miner

Tip: Working with Sparse Data in SAS

Re: What Modelling technique to use in order to attribute the right of...

Re: What Modelling technique to use in order to attribute the right of...

Re: Deep learning in SAS Enterprise miner

Recently Published Machine Learning Resources

Re: Sourcing, manipulating working with data in SAS

Re: Interpretation of exponentiated coefficient for categorical variab...

Interpretation of exponentiated coefficient for categorical variable i...

Re: SAS Enterprise Miner SVM

Re: Deep learning in SAS Enterprise miner

Re: Deep learning in SAS Enterprise miner

Re: RD-Tree algorithm in MBR node

Re: Tip: Open Source Integration Using the Base SAS Java Object

Re: Tip: Open Source Integration Using the Base SAS Java Object

Re: How to predict date of birth using First name in SAS? Please help ...

Re: Keep the science in data science, please