SAS Data Science

Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Viya (Machine Learning), SAS Visual Text Analytics, with point-and-click interfaces or programming
BookmarkSubscribeRSS Feed
kodexolabs
Calcite | Level 5

Automated feature engineering in AI and machine learning extends beyond simple feature creation, requiring a thorough knowledge and manipulation of data to improve model performance. It incorporates techniques such as feature selection, transformation, and generation.

A widespread misperception is that automated feature engineering involves simply generating new features from current data. In reality, it entails thorough investigation and change to increase predictive power. Feature selection highlights the most important features, which reduces data dimensionality and eliminates redundancies. For example, Recursive Feature Elimination (RFE) can be used to identify significant features.

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.feature_selection import RFE

X = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
y = np.array([0, 1, 0, 1])

model = LogisticRegression()
rfe = RFE(model, 2)
fit = rfe.fit(X, y)

print(f"Num Features: {fit.n_features_}")
print(f"Selected Features: {fit.support_}")
print(f"Feature Ranking: {fit.ranking_}")

Feature transformation alters the data's structure or scale to make it more suited for model training, utilizing techniques like PCA to minimize dimensionality while maintaining critical information.

import numpy as np
from sklearn.decomposition import PCA

X = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
pca = PCA(n_components=2)
principal_components = pca.fit_transform(X)

print(f"Principal Components:\n{principal_components}")

Feature creation produces new features by merging or altering existing ones, revealing hidden patterns in data. Automated programs, like FeatureTools, can generate features automatically.

import featuretools as ft
import pandas as pd

data = {'id': [1, 2, 3, 4], 'value': [10, 20, 30, 40]}
df = pd.DataFrame(data)

es = ft.EntitySet(id='data')
es.entity_from_dataframe(entity_id='df', dataframe=df, index='id')

feature_matrix, feature_defs = ft.dfs(entityset=es, target_entity='df')
print(feature_matrix)

Automated feature engineering systems employ AI to iteratively evaluate and validate new features before selecting the best for model training. For example, TPOT can automate feature engineering and model selection

from tpot import TPOTClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)

tpot = TPOTClassifier(verbosity=2, generations=5, population_size=20)
tpot.fit(X_train, y_train)

print(tpot.score(X_test, y_test))
tpot.export('tpot_pipeline.py')

To summarize, automated feature engineering entails choosing, modifying, and producing features to enhance model performance. These advanced methodologies contribute to the development of powerful prediction models, hence promoting innovation and efficiency in AI and data science.

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 0 replies
  • 768 views
  • 0 likes
  • 1 in conversation