I would like to investigate if methods of advanced statistical process control (such as control charts) could be applied to pinpoint potentially erroneous data in a large dataset.
Are there any available data sets that I could use?
I am an MSc student and I could use any available data sets for my dissertation.
Thank you,
Vaggelis Vergoulas
@Vergoulas wrote:
I would like to investigate if methods of advanced statistical process control (such as control charts) could be applied to pinpoint potentially erroneous data in a large dataset.
I don't think YOU specifically need to do an investigation of this type. I think the answer is already known.
The answer is that these types of control charts were not designed to pinpoint erroneous data, but they certainly have been used to detect problems in the data. Of course, once a "problem" in data happens, the issue then becomes: is it real data, or an erroneous data? I don't think any statistical method can answer this question, the only way a decision that the decision can be made that the data is erroneous is to apply subject matter knowledge.
Dear Paige, thank you very much for your quick and very informed answer.
My aim is to investigate if these charts could be used to detect these “problems” or better “potential problems” as they happen and request attention to the process. Theoretically, they are made for that. I wanted to focus on the points of data capture (or creation if you will). I do not know if such methods are in use for data. Your input on this subject could be very helpful.
Thank you very much.
I guess I don't know what you mean by "focus on the points of data capture", and how that differs from just analyzing the data.
Certainly control charts can be used on any data, and properly used under the right conditions control charts will detect problems and can request immediate attention to the process.
It sounds as if you are starting from scratch in this research, even though control charts are among the most widely used statistical tools ever invented, and furthermore there is a huge amount of published literature on their use.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.