Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Help finding enterprise datasets for process mining research

New Contributor
Posts: 2

Help finding enterprise datasets for process mining research

Hello all,


I am beginning a master thesis project requiring enterprise datasets. The intended approach is to use process mining techniques to estimate anomalies and to perform other analytics. Does SAS provide any such datasets for research? Could anyone suggest a good source where these could be found (academic or governmental, especially)? The requirement is for a dataset capable of describing organizational processes and activities. Large amounts of ERP data would probably suffice. But it also need not be enterprise/private in nature; it could just as well be governmental or healthcare data.


Example scenario:

A manufacturer tracks its production/orders using a unifying planning and inventory system (eg, Dataflo). Occasionally, defects occur when individuals deviate from process. We wish to use the data generated from the system to detect anomalies, such as deviations from known processes (employee A suddenly inverts steps in a method; or software developer B closes a bug without approval, etc).


It has been exceptionally difficult to find enterprise-oriented datasets for research. Despite their abundance "in the wild", most are confidential in nature. Any help or suggestions is appreciated. I thought it would be appropriate to ask, since this approach closely aligns with the solutions that SAS has provided to various clients.



Super Contributor
Posts: 336

Re: Help finding enterprise datasets for process mining research

Hi Yeti,

Not sure if I get the right idea of your example scenario. What would this data look like (inputs, target, ids, etc)?

Show us a concrete mockup of this data. Maybe there is something similar from a SAS course or a data set in sashelp or sampsio libraries.


In the meantime, this book is a go-to for research and proofs of concept:


Simulating data with SAS

by Rick Wicklin


Good luck!


New Contributor
Posts: 2

Re: Help finding enterprise datasets for process mining research

What would this data look like (inputs, target, ids, etc)?


Basically data that I can assign to processes, or from which I identify processes. Process mining lit usually refers to "event logs" and other event-oriented data, but practitioners are usually extracting and normalizing their own event-log datasets from multiple, heterogeneous data sources. In several examples, they generate hospital event logs from the admittance, care-progression, or other data warehouses at a hospital. Another example might be process-mining a computer hardware company by aggregating event data from a defect-log system, an inventory system, and a communication system, and incorporating these sources into a single dataset describing their processes.


The benefit is that process mining is capable of looking across multiple data sources, rather than at a single data source, to identify processes of various kinds: bottlenecks, anomalies, or just overall organizational descriptions. No software system fully-encompasses any organization's needs, so there is typically some sort of aggregation involved: taking multiple data sources, and converting them to something that neatly describes processes (events) and is consumable by common process mining methods.


Sorry, I should have clarified. I assume I'll have to do the aggregation/extraction myself, so I guess the question is where one might find organizational datasets of that scale, encompassing multiple data sources. Many companies and state agencies use such all-encompassing ERP systems for inventory, defect-tracking, CRM, and so on. The difficulty is finding anything of that scale in the public domain, and suitable for research. But you can imagine the benefit it might have for something like a large healthcare organization, whether state or private. Which I mention suggestively: there is so much data tracking in healthcare, it seems like a fruitful source of datsets, I just haven't found any yet.


Thanks for your response!

Ask a Question
Discussion stats
  • 2 replies
  • 2 in conversation