Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Re: unequal subgroup sample sizes in SPC chart

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

☑ This topic is **solved**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 05-05-2024 10:39 PM
(610 views)

I want to Apply Tests for Special Causes in the p-chart, but the sample size for each group is different. So, I don’t have a fixed UCL and LCL, and I will get the following warning with below code: WARNING: Asymmetric control limits encountered for yes_answer1 for at least one subgroup.

```
proc shewhart data=tmp2;
pchart yes_answer1*quarter / subgroupn = total_count1
tests =1 to 8
TESTNMETHOD=STANDARDIZE
table
tablelegend;
run;
```

Note: The SHEWHART procedure provides an option for working with unequal subgroup sample sizes. For example, I can use the LIMITN= option to specify a fixed (nominal) sample size for computing the control limits.

Below is the sample size for each group that I have; my question is, how can I choose the number for LIMITN?

quarter Subgroup

2021Q3 30

2021Q4 66

2022Q1 54

2022Q2 66

2022Q3 69

2022Q4 74

2023Q1 83

2023Q2 96

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi, I don't know an answer. I saw your other post with this question. I think your sample sizes look reasonable to me. At this moment I'm making a p-chart with sample sizes that bounce around similarly, and are typically between 20 and 30. I'm not aware of any firm rules for sample size for a p-chart, but I'm sure people have their general recommendations. For my chart, I decided to exclude any lots that had a sample size less than 10. But I think a sample size of 20 or 30 is quite reasonable.

I would make the chart, and see if it looks useful. And if people use the chart, and pay attention to when a lot is out of the process control limits (e.g. investigate further to see if they can identify a special cause for that variation), over time you will get a sense of whether your chart is providing to many false alerts, or isn't providing enough alerts, and you can consider adjusting accordingly.

But I'm not an SPC expert or anything, I'm just a fan of using SPC charts where I think they are helpful. And I figure if someone sees one of my charts and tells me I'm doing something wrong/breaking a rule, we'll have a conversation about the methods they would recommend, discuss pros and cons, and possibly adapt. On my current project, I'm hoping that might happen.

In past roles where I did more stats, I would have felt more committed to finding the "right" statistical test and, if questioned about it, being prepared to defend why the method I used was correct and provided the best test of a hypothesis. But for SPC, I'm much looser with how important it is that a chart be "right" (is some sense). For my current project, I made a p-chart and an IR-chart of the proportions, and a Lahey p-chart, compared the results and discussed with some colleagues, then just picked one. It didn't really matter which I used, because they all told the same story.

BASUG is hosting ** free webinars ** Next up: **Mike Raithel** presenting on validating data files on Wednesday July 17. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

7 REPLIES 7

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Did you try LIMITN=VARYING?

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you for your answer; yes, my question is, what is the appropriate number for LIMITN? how to choose LIMITN?

here is the sample size of each subgroup:

quarter Subgroup

2021Q3 30

2021Q4 66

2022Q1 54

2022Q2 66

2022Q3 69

2022Q4 74

2023Q1 83

2023Q2 96

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I don't think there is a single value of N that works here. I guess you could use an average or median or something similar, but for some data points it will give limits that are too wide while for other data points it will give limits that are too narrow, so I don't recommend this.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I've been playing around with some data to make a p-chart recently, where I also have varying subgroup sizes. As I was reading up again on p-charts, I was surprised to see Donald Wheeler is actually a fan of using IR-charts where most textbooks would advise using a p-chart (or np-chart or c-chart or u-chart). His basic argument is that if the p-chart assumption of a binomial distribution is right, the IR-chart will give you about the same limits, but if the assumption of binomial distribution is wrong, the IR-chart will give you better empirical limits rather than using the p-chart's model-based limits. See e.g. https://www.qualitydigest.com/inside/quality-insider-article/what-about-p-charts-093011.html and https://www.qualitydigest.com/static/magazine/jul/spctool.html

Since Wheeler's books were my introduction to SPC, I tend to like his ideas. So for my case with data like yours, I'm thinking I may just make an IR-chart of the p-values. Another side benefit of me is that this will also give each lot equal weight, independent of sample size. A typical p chart or mean chart with varying sample size will give more weight to the lots have more data. That makes sense if your process is under control, and the sample size for each lot is not meaningful information. But in some cases with two-stage sampling designs, when there is a lot with poor performance characteristics they sample more data. So you can end up giving more weight to poor-performing lots.

One thing I love about SPC is that it's a very practical / applied field. In the end, you could make a p-chart and an IR-chart, and you could try making different p-charts with different values for LIMITN. With SPC, your goal is not to calculate the "correct" process limits in the same way that you might want to calculate the "correct" estimate of the variance of a parameter in some fancy statistical model. Instead, your goal (in my experience), is to calculate "useful" limits, where "useful" means do the process limits help you understand your process, and are they useful in helping bring the process under control and identifying lots with special cause variation.

BASUG is hosting ** free webinars ** Next up: **Mike Raithel** presenting on validating data files on Wednesday July 17. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you for this answer; I noticed that the number of data points within the P-chart should be at least 20, so I decided to create the P-chart by month, not by quarter, as above.

when I created the P-chart by month, the sample size of each subgroup was as below; I am not sure if I am eligible to use the P-chart for this sample size. Or is the P-chart reliable for this sample size? I found somewhere the number of sample size for each subgroup to use the P chart is at least 50, not sure, I guess you might know the answer

10 |

19 |

20 |

24 |

24 |

20 |

24 |

12 |

15 |

23 |

30 |

20 |

24 |

27 |

21 |

32 |

23 |

24 |

26 |

35 |

27 |

33 |

38 |

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi, I don't know an answer. I saw your other post with this question. I think your sample sizes look reasonable to me. At this moment I'm making a p-chart with sample sizes that bounce around similarly, and are typically between 20 and 30. I'm not aware of any firm rules for sample size for a p-chart, but I'm sure people have their general recommendations. For my chart, I decided to exclude any lots that had a sample size less than 10. But I think a sample size of 20 or 30 is quite reasonable.

I would make the chart, and see if it looks useful. And if people use the chart, and pay attention to when a lot is out of the process control limits (e.g. investigate further to see if they can identify a special cause for that variation), over time you will get a sense of whether your chart is providing to many false alerts, or isn't providing enough alerts, and you can consider adjusting accordingly.

But I'm not an SPC expert or anything, I'm just a fan of using SPC charts where I think they are helpful. And I figure if someone sees one of my charts and tells me I'm doing something wrong/breaking a rule, we'll have a conversation about the methods they would recommend, discuss pros and cons, and possibly adapt. On my current project, I'm hoping that might happen.

In past roles where I did more stats, I would have felt more committed to finding the "right" statistical test and, if questioned about it, being prepared to defend why the method I used was correct and provided the best test of a hypothesis. But for SPC, I'm much looser with how important it is that a chart be "right" (is some sense). For my current project, I made a p-chart and an IR-chart of the proportions, and a Lahey p-chart, compared the results and discussed with some colleagues, then just picked one. It didn't really matter which I used, because they all told the same story.

BASUG is hosting ** free webinars ** Next up: **Mike Raithel** presenting on validating data files on Wednesday July 17. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you so much for your explanation; it was helpful. Are agree with using a P-chart on a monthly basis rather than quarterly, even though this would reduce the sample size in each subgroup? I think having a sufficient number of data points (the number of subgroup) is more important than the size of each subgroup.

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.