In this article, you will learn how Machine Learning Pipeline Automation (MLPA) in SAS Model Studio utilizes date columns when set as input in the data tab.
The difference between a good model and an excellent model is determined by how rich your modeling data is. To improve the accuracy of the model, feature extraction is one of the crucial steps.
Did you know that dates are not only limited to time series modeling and you can use the features from dates in a machine learning pipeline to enrich the modeling data?
Let’s see how you can extract features such as year, month, quarter, weekday, and day from your date column and use it as input to your automated machine learning pipeline instead of rejecting dates or simply using them as raw dates.
In SAS Model Studio a date variable is identified as a column with any date format for example MMDDYYY10. etc. and has a numeric type. By default, the role of such variables is set to rejected.
However, a user can override this default by simply setting it as either input or ID. Once the role of a date variable is overridden, the automated machine learning pipeline will insert a custom sas code node right after the data node where all the magic will happen.
In this example, I am using a housing dataset that has two date variables i.e. “Claim Date” and “Effective Date” but you can use any dataset in which you would like to process the dates.
Once the role of the desired date has been set to either ‘INPUT’ or ‘ID’ and the target has been assigned, we can proceed with generating an automated machine learning pipeline. For this example, I am going to set “Claim_Date” as an input.
Create a new pipeline and select “Automatically generate the pipeline” and set the automation time limit to your desired value. I am going with the default of 15 minutes for this example.
Once the pipeline has finished running successfully, you will notice that there is a SAS Code node right after the Data node (which you won’t see if the date variable is set to rejected).
If you open the code editor of that SAS Code node, it shows the score code that has been generated automatically for you, which extracts new features from the date variable “Claim_Date” and sets the original variable to “Rejected”. This ensures that the pipeline is only using newly generated features and not the original variable to avoid any redundancy.
Let’s take a look at the output variable table in the results window. It shows below the five new variables that have been generated and the original “Claim_Date” variable has been set to the ‘REJECTED’ role.
Lastly, let’s take a look at one of the candidate models and see if any of the new features generated contributed to it.
Below is the result from the Linear regression model node and “yr_CLA1” variable which represents year information from the “Claim_Date” variable is one of the significant variables in this model.
Note, you can always add another pipeline with none of the date variables as input and do the pipeline comparison with date variables as input to see if adding these new date features improves the accuracy of your model as it is highly dependent on the type of dataset you are using.
Similarly, learn more about extracting features from text variables in this article.
In this article, I have tried to showcase one of the ways how dates can be utilized in the modeling process. This technique can be used to improve the predictive performance of the models depending on the type of dataset.
It also covered an example of SAS Model Studio Automated pipeline creation, which uses a date variable to extract new features for subsequent use.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.