SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Use of statistical process control to clean data

Reply
New Contributor
Posts: 2

Use of statistical process control to clean data

I would like to investigate if methods of advanced statistical process control (such as control charts) could be applied to pinpoint potentially erroneous data in a large dataset. 

Are there any available data sets that I could use? 

I am an MSc student and I could use any available data sets for my dissertation. 

 

Thank you,

Vaggelis Vergoulas

Respected Advisor
Posts: 2,816

Re: Use of statistical process control to clean data

Posted in reply to Vergoulas

@Vergoulas wrote:

I would like to investigate if methods of advanced statistical process control (such as control charts) could be applied to pinpoint potentially erroneous data in a large dataset. 



I don't think YOU specifically need to do an investigation of this type. I think the answer is already known.

 

The answer is that these types of control charts were not designed to pinpoint erroneous data, but they certainly have been used to detect problems in the data. Of course, once a "problem" in data happens, the issue then becomes: is it real data, or an erroneous data? I don't think any statistical method can answer this question, the only way a decision that the decision can be made that the data is erroneous is to apply subject matter knowledge.

--
Paige Miller
New Contributor
Posts: 2

Re: Use of statistical process control to clean data

Posted in reply to PaigeMiller

Dear Paige, thank you very much for your quick and very informed answer.

 

My aim is to investigate if these charts could be used to detect these “problems” or better “potential problems” as they happen and request attention to the process. Theoretically, they are made for that. I wanted to focus on the points of data capture (or creation if you will). I do not know if such methods are in use for data. Your input on this subject could be very helpful.

 

Thank you very much.

Respected Advisor
Posts: 2,816

Re: Use of statistical process control to clean data

[ Edited ]
Posted in reply to Vergoulas

I guess I don't know what you mean by "focus on the points of data capture", and how that differs from just analyzing the data.

 

Certainly control charts can be used on any data, and properly used under the right conditions control charts will detect problems and can request immediate attention to the process.

 

It sounds as if you are starting from scratch in this research, even though control charts are among the most widely used statistical tools ever invented, and furthermore there is a huge amount of published literature on their use.

--
Paige Miller
Ask a Question
Discussion stats
  • 3 replies
  • 130 views
  • 1 like
  • 2 in conversation