BookmarkSubscribeRSS Feed
Ullsokk
Pyrite | Level 9

I want to set up some data drift monitoring. My first inclination was to write some code to get missing, mean, median, min, max etc for each column in my datasets (aobut 2000 variables). But the Run profile in Data explorer does most of what I want, so I was wondering if there is a way to use the Profiling to create a dataset that allows me to monitor how a dataset changes from day to day. The relevant dataset is uploaded each day as part of scoring data, so I would somehow need to call the Run Profile task e.g. via the VIYA API, and store the profile results for each day. 

I couldn't find anything about running data profiles on developer.sas.com.

Is this a dead end? Am I better off just writing some proc meas and transposes? Or are the data profiles meant to be used as a means to monitor data drift?

 

1 REPLY 1
sbxkoenk
SAS Super FREQ

Hello,

 

Determining (running) the profile day-by-day does not give you a definite answer on a possible population drift. You would still have to compare the distributions for variable i to get alerted on a SIGNIFICANT change.

 

In Credit Scoring (finance) there exists sthg. called the stability report and the stability index. 

It measures possible population drift that can be due to :
• Seasonality
• Changing economic climate
• Customer drop-out
• Changing sales and marketing strategies, marketing campaign, niche competition
• Customer profile changed independently e.g. demographic change
• Mistakes in data capture, systematic error (coding), non-random sample, exclusions

 

I don't know about the technical underpinnings of this index, but I believe it's close to the Kullback-Leibler Distance / divergence.

It would require you to bin the range of your variables (10 or 15 buckets will do).

Let me know if you need more info on this. If needed I can make and upload some code, but if you can sort it out yourself, that's even better of course ;-).

 

Cheers,

Koen