Solved: Re: unequal subgroup sample sizes in SPC chart

bhr-q · Posted 05-05-2024 10:39 PM

I want to Apply Tests for Special Causes in the p-chart, but the sample size for each group is different. So, I don’t have a fixed UCL and LCL, and I will get the following warning with below code: WARNING: Asymmetric control limits encountered for yes_answer1 for at least one subgroup.

proc shewhart data=tmp2;
pchart yes_answer1*quarter / subgroupn = total_count1	
                             tests =1 to 8
                             TESTNMETHOD=STANDARDIZE
                             table
                             tablelegend;                         
run;

Note: The SHEWHART procedure provides an option for working with unequal subgroup sample sizes. For example, I can use the LIMITN= option to specify a fixed (nominal) sample size for computing the control limits.

Below is the sample size for each group that I have; my question is, how can I choose the number for LIMITN?

quarter Subgroup

2021Q3 30

2021Q4 66

2022Q1 54

2022Q2 66

2022Q3 69

2022Q4 74

2023Q1 83

2023Q2 96

Quentin

Hi, I don't know an answer. I saw your other post with this question. I think your sample sizes look reasonable to me. At this moment I'm making a p-chart with sample sizes that bounce around similarly, and are typically between 20 and 30. I'm not aware of any firm rules for sample size for a p-chart, but I'm sure people have their general recommendations. For my chart, I decided to exclude any lots that had a sample size less than 10. But I think a sample size of 20 or 30 is quite reasonable.

I would make the chart, and see if it looks useful. And if people use the chart, and pay attention to when a lot is out of the process control limits (e.g. investigate further to see if they can identify a special cause for that variation), over time you will get a sense of whether your chart is providing to many false alerts, or isn't providing enough alerts, and you can consider adjusting accordingly.

But I'm not an SPC expert or anything, I'm just a fan of using SPC charts where I think they are helpful. And I figure if someone sees one of my charts and tells me I'm doing something wrong/breaking a rule, we'll have a conversation about the methods they would recommend, discuss pros and cons, and possibly adapt. On my current project, I'm hoping that might happen.

In past roles where I did more stats, I would have felt more committed to finding the "right" statistical test and, if questioned about it, being prepared to defend why the method I used was correct and provided the best test of a hypothesis. But for SPC, I'm much looser with how important it is that a chart be "right" (is some sense). For my current project, I made a p-chart and an IR-chart of the proportions, and a Lahey p-chart, compared the results and discussed with some colleagues, then just picked one. It didn't really matter which I used, because they all told the same story.

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

View solution in original post

PaigeMiller · Posted 05-06-2024 06:18 AM

Did you try LIMITN=VARYING?

--
Paige Miller

bhr-q · Posted 05-06-2024 07:17 AM

Thank you for your answer; yes, my question is, what is the appropriate number for LIMITN? how to choose LIMITN?

here is the sample size of each subgroup:

quarter Subgroup

2021Q3 30

2021Q4 66

2022Q1 54

2022Q2 66

2022Q3 69

2022Q4 74

2023Q1 83

2023Q2 96

PaigeMiller · Posted 05-06-2024 07:29 AM

I don't think there is a single value of N that works here. I guess you could use an average or median or something similar, but for some data points it will give limits that are too wide while for other data points it will give limits that are too narrow, so I don't recommend this.

--
Paige Miller

Quentin · Posted 05-06-2024 08:16 AM

I've been playing around with some data to make a p-chart recently, where I also have varying subgroup sizes. As I was reading up again on p-charts, I was surprised to see Donald Wheeler is actually a fan of using IR-charts where most textbooks would advise using a p-chart (or np-chart or c-chart or u-chart). His basic argument is that if the p-chart assumption of a binomial distribution is right, the IR-chart will give you about the same limits, but if the assumption of binomial distribution is wrong, the IR-chart will give you better empirical limits rather than using the p-chart's model-based limits. See e.g. https://www.qualitydigest.com/inside/quality-insider-article/what-about-p-charts-093011.html and https://www.qualitydigest.com/static/magazine/jul/spctool.html

Since Wheeler's books were my introduction to SPC, I tend to like his ideas. So for my case with data like yours, I'm thinking I may just make an IR-chart of the p-values. Another side benefit of me is that this will also give each lot equal weight, independent of sample size. A typical p chart or mean chart with varying sample size will give more weight to the lots have more data. That makes sense if your process is under control, and the sample size for each lot is not meaningful information. But in some cases with two-stage sampling designs, when there is a lot with poor performance characteristics they sample more data. So you can end up giving more weight to poor-performing lots.

One thing I love about SPC is that it's a very practical / applied field. In the end, you could make a p-chart and an IR-chart, and you could try making different p-charts with different values for LIMITN. With SPC, your goal is not to calculate the "correct" process limits in the same way that you might want to calculate the "correct" estimate of the variance of a parameter in some fancy statistical model. Instead, your goal (in my experience), is to calculate "useful" limits, where "useful" means do the process limits help you understand your process, and are they useful in helping bring the process under control and identifying lots with special cause variation.

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

bhr-q

Thank you for this answer; I noticed that the number of data points within the P-chart should be at least 20, so I decided to create the P-chart by month, not by quarter, as above.

when I created the P-chart by month, the sample size of each subgroup was as below; I am not sure if I am eligible to use the P-chart for this sample size. Or is the P-chart reliable for this sample size? I found somewhere the number of sample size for each subgroup to use the P chart is at least 50, not sure, I guess you might know the answer

10
19
20
24
24
20
24
12
15
23
30
20
24
27
21
32
23
24
26
35
27
33
38

Quentin

Hi, I don't know an answer. I saw your other post with this question. I think your sample sizes look reasonable to me. At this moment I'm making a p-chart with sample sizes that bounce around similarly, and are typically between 20 and 30. I'm not aware of any firm rules for sample size for a p-chart, but I'm sure people have their general recommendations. For my chart, I decided to exclude any lots that had a sample size less than 10. But I think a sample size of 20 or 30 is quite reasonable.

I would make the chart, and see if it looks useful. And if people use the chart, and pay attention to when a lot is out of the process control limits (e.g. investigate further to see if they can identify a special cause for that variation), over time you will get a sense of whether your chart is providing to many false alerts, or isn't providing enough alerts, and you can consider adjusting accordingly.

But I'm not an SPC expert or anything, I'm just a fan of using SPC charts where I think they are helpful. And I figure if someone sees one of my charts and tells me I'm doing something wrong/breaking a rule, we'll have a conversation about the methods they would recommend, discuss pros and cons, and possibly adapt. On my current project, I'm hoping that might happen.

In past roles where I did more stats, I would have felt more committed to finding the "right" statistical test and, if questioned about it, being prepared to defend why the method I used was correct and provided the best test of a hypothesis. But for SPC, I'm much looser with how important it is that a chart be "right" (is some sense). For my current project, I made a p-chart and an IR-chart of the proportions, and a Lahey p-chart, compared the results and discussed with some colleagues, then just picked one. It didn't really matter which I used, because they all told the same story.

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

bhr-q

Thank you so much for your explanation; it was helpful. Are agree with using a P-chart on a monthly basis rather than quarterly, even though this would reduce the sample size in each subgroup? I think having a sufficient number of data points (the number of subgroup) is more important than the size of each subgroup.