I'm making a pChart using PROC SHEWHART, and my subgroups (lots) have varying sizes. I want to give each lot the same weight when calculating pbar, rather than let lots with larger sample sizes have more weight.
I assumed PROC SHEWHART would have a WEIGHT statement, but it does not. My next thought is to calculate pbar myself, and then pass the value to SHEWHART via the p0 option on the pchart statement. Does this seem like a reasonable approach?
As an example, given data like:
data have ;
input lot pfailed ntested ;
cards ;
1 .1 20
2 .2 20
3 .1 20
4 .2 20
5 .4 60
;
PROC SHEWHART will calculate pbar as a weighted mean of the proportions, giving lot 5 more weight than the other lots, and you get pbar=.26.
proc shewhart data=have ;
pchart pfailed*lot/subgroupn=ntested dataunit=proportion;
run ;
My thought is to calculate pbar myself as the unweighted mean, and you get pbar=.2, and pass that value to PROC SHEWHART:
proc sql noprint;
select mean(pfailed) into :pbar trimmed
from have
;
quit ;
%put &=pbar ;
proc shewhart data=have ;
pchart pfailed*lot/subgroupn=ntested dataunit=proportion p0=&pbar;
run ;
You would need to compute pbar outside of PROC SHEWHART, the way you showed. The procedure doesn't have an option for computing a simple average of proportions.
Now the control limits are constant because the subgroup sample sizes are constant. If you ignore the subgroup sample sizes, the varying control limits are not correct.
It's weighted in the sense that a subgroup with larger number of items will contribute more to the estimate of pbar. At least that is my understanding of:
I see the proportion for each subgroup being weighted by the number of items in the subgroup. I would see an unweighted estimate of pbar as just the average of the proportions, i.e.: (p_1+...+p_N)/N.
I don't want to use subgroupN, because that would exclude lot 5 from contributing to the calculation of the control limits.
In my general SPC reading when they show examples with varying size to the subgroups, typically there is little variation, and I think the assumption is that the size of a subgroup is uninformative. In that setting, a larger sample size probably should get more weight, because it provides a better estimate.
But for my example, the data collection process typically samples ~20 items. If they find evidence of an increased rate for a subgroup, they sample 40 more items from that subgroup. So you have some subgroups with 3x the size of other subgroups, and typically they are also the subgroups with an unusual value for the proportion (uncontrolled process). I don't want to give these subgroups more weight than the others in calculating pbar.
Sorry, I confused your use of SUBGROUPN for LIMITN.
Yes, I could use SUBGROUPN to tell PROC SHEWHART that there are 20 items in each subgroup, and that would calculated pbar=.2 as I would like.
But of the course then the process limits are 'wrong' for subgroups with more than (or less than) 20 items. So in my example, lot 5 should have tighter control limits because of the large sample size. And that's what you get from SHEWHART when you tell it the sample size for each subgroup. I don't want to change that behavior. I just want the estimate of the process mean to be the simple mean of the group subgroup proportions, rather than a weighted mean.
You would need to compute pbar outside of PROC SHEWHART, the way you showed. The procedure doesn't have an option for computing a simple average of proportions.
Thanks much @Zard for helping me think this through. I appreciate your taking the time to share examples of different approaches.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.