Generating Better Synthetic Data

Started ‎09-29-2023 by

Modified ‎10-03-2023 by

This presentation combined three techniques to generate better synthetic data. Active sampling, tabular GAN and SAS® autotuning can generate data that improves a model's prediction task. A variation of query by committee written in SAS will be used to subsample a relevant data representation. A sample of cases near the decision boundary for the problem at hand will be drawn. Sampling the data in this manner will reduce the number of observations to only those most relevant to the problem, improving the signal and reducing the computational burden on the GAN. Tabular GAN will learn representations from only the subsampled data. Participants will see how we can leverage autotuning to tune the tabular GAN model to produce better synthetic representations where autotuning will maximize the error of a pseudo discriminator that attempts to distinguish between real and artificial data. Participants can apply what they learn to most supervised learning problems. This presentation combined key SAS technologies in a way that can benefit most data scientists working on supervised learning problems.

Presentation slides are attached to this post.

Catch up on SAS Innovate 2026

Nearly 200 sessions are now available on demand with the SAS Innovate Digital Pass.

Explore Now →

Article Labels

Article Tags