I want to Apply Tests for Special Causes in the p-chart, but the sample size for each group is different. So, I don’t have a fixed UCL and LCL, and I will get the following warning with below code: WARNING: Asymmetric control limits encountered for yes_answer1 for at least one subgroup.
proc shewhart data=tmp2;
pchart yes_answer1*quarter / subgroupn = total_count1
tests =1 to 8
TESTNMETHOD=STANDARDIZE
table
tablelegend;
run;
Note: The SHEWHART procedure provides an option for working with unequal subgroup sample sizes. For example, I can use the LIMITN= option to specify a fixed (nominal) sample size for computing the control limits.
Below is the sample size for each group that I have; my question is, how can I choose the number for LIMITN?
quarter Subgroup
2021Q3 30
2021Q4 66
2022Q1 54
2022Q2 66
2022Q3 69
2022Q4 74
2023Q1 83
2023Q2 96
Hi, I don't know an answer. I saw your other post with this question. I think your sample sizes look reasonable to me. At this moment I'm making a p-chart with sample sizes that bounce around similarly, and are typically between 20 and 30. I'm not aware of any firm rules for sample size for a p-chart, but I'm sure people have their general recommendations. For my chart, I decided to exclude any lots that had a sample size less than 10. But I think a sample size of 20 or 30 is quite reasonable.
I would make the chart, and see if it looks useful. And if people use the chart, and pay attention to when a lot is out of the process control limits (e.g. investigate further to see if they can identify a special cause for that variation), over time you will get a sense of whether your chart is providing to many false alerts, or isn't providing enough alerts, and you can consider adjusting accordingly.
But I'm not an SPC expert or anything, I'm just a fan of using SPC charts where I think they are helpful. And I figure if someone sees one of my charts and tells me I'm doing something wrong/breaking a rule, we'll have a conversation about the methods they would recommend, discuss pros and cons, and possibly adapt. On my current project, I'm hoping that might happen.
In past roles where I did more stats, I would have felt more committed to finding the "right" statistical test and, if questioned about it, being prepared to defend why the method I used was correct and provided the best test of a hypothesis. But for SPC, I'm much looser with how important it is that a chart be "right" (is some sense). For my current project, I made a p-chart and an IR-chart of the proportions, and a Lahey p-chart, compared the results and discussed with some colleagues, then just picked one. It didn't really matter which I used, because they all told the same story.
Did you try LIMITN=VARYING?
Thank you for your answer; yes, my question is, what is the appropriate number for LIMITN? how to choose LIMITN?
here is the sample size of each subgroup:
quarter Subgroup
2021Q3 30
2021Q4 66
2022Q1 54
2022Q2 66
2022Q3 69
2022Q4 74
2023Q1 83
2023Q2 96
I don't think there is a single value of N that works here. I guess you could use an average or median or something similar, but for some data points it will give limits that are too wide while for other data points it will give limits that are too narrow, so I don't recommend this.
I've been playing around with some data to make a p-chart recently, where I also have varying subgroup sizes. As I was reading up again on p-charts, I was surprised to see Donald Wheeler is actually a fan of using IR-charts where most textbooks would advise using a p-chart (or np-chart or c-chart or u-chart). His basic argument is that if the p-chart assumption of a binomial distribution is right, the IR-chart will give you about the same limits, but if the assumption of binomial distribution is wrong, the IR-chart will give you better empirical limits rather than using the p-chart's model-based limits. See e.g. https://www.qualitydigest.com/inside/quality-insider-article/what-about-p-charts-093011.html and https://www.qualitydigest.com/static/magazine/jul/spctool.html
Since Wheeler's books were my introduction to SPC, I tend to like his ideas. So for my case with data like yours, I'm thinking I may just make an IR-chart of the p-values. Another side benefit of me is that this will also give each lot equal weight, independent of sample size. A typical p chart or mean chart with varying sample size will give more weight to the lots have more data. That makes sense if your process is under control, and the sample size for each lot is not meaningful information. But in some cases with two-stage sampling designs, when there is a lot with poor performance characteristics they sample more data. So you can end up giving more weight to poor-performing lots.
One thing I love about SPC is that it's a very practical / applied field. In the end, you could make a p-chart and an IR-chart, and you could try making different p-charts with different values for LIMITN. With SPC, your goal is not to calculate the "correct" process limits in the same way that you might want to calculate the "correct" estimate of the variance of a parameter in some fancy statistical model. Instead, your goal (in my experience), is to calculate "useful" limits, where "useful" means do the process limits help you understand your process, and are they useful in helping bring the process under control and identifying lots with special cause variation.
Thank you for this answer; I noticed that the number of data points within the P-chart should be at least 20, so I decided to create the P-chart by month, not by quarter, as above.
when I created the P-chart by month, the sample size of each subgroup was as below; I am not sure if I am eligible to use the P-chart for this sample size. Or is the P-chart reliable for this sample size? I found somewhere the number of sample size for each subgroup to use the P chart is at least 50, not sure, I guess you might know the answer
10 |
19 |
20 |
24 |
24 |
20 |
24 |
12 |
15 |
23 |
30 |
20 |
24 |
27 |
21 |
32 |
23 |
24 |
26 |
35 |
27 |
33 |
38 |
Hi, I don't know an answer. I saw your other post with this question. I think your sample sizes look reasonable to me. At this moment I'm making a p-chart with sample sizes that bounce around similarly, and are typically between 20 and 30. I'm not aware of any firm rules for sample size for a p-chart, but I'm sure people have their general recommendations. For my chart, I decided to exclude any lots that had a sample size less than 10. But I think a sample size of 20 or 30 is quite reasonable.
I would make the chart, and see if it looks useful. And if people use the chart, and pay attention to when a lot is out of the process control limits (e.g. investigate further to see if they can identify a special cause for that variation), over time you will get a sense of whether your chart is providing to many false alerts, or isn't providing enough alerts, and you can consider adjusting accordingly.
But I'm not an SPC expert or anything, I'm just a fan of using SPC charts where I think they are helpful. And I figure if someone sees one of my charts and tells me I'm doing something wrong/breaking a rule, we'll have a conversation about the methods they would recommend, discuss pros and cons, and possibly adapt. On my current project, I'm hoping that might happen.
In past roles where I did more stats, I would have felt more committed to finding the "right" statistical test and, if questioned about it, being prepared to defend why the method I used was correct and provided the best test of a hypothesis. But for SPC, I'm much looser with how important it is that a chart be "right" (is some sense). For my current project, I made a p-chart and an IR-chart of the proportions, and a Lahey p-chart, compared the results and discussed with some colleagues, then just picked one. It didn't really matter which I used, because they all told the same story.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.