Solved: Re: PROC SHEWHART pchart weight of subgroups with varying sizes

Quentin

I'm making a pChart using PROC SHEWHART, and my subgroups (lots) have varying sizes. I want to give each lot the same weight when calculating pbar, rather than let lots with larger sample sizes have more weight.

I assumed PROC SHEWHART would have a WEIGHT statement, but it does not. My next thought is to calculate pbar myself, and then pass the value to SHEWHART via the p0 option on the pchart statement. Does this seem like a reasonable approach?

As an example, given data like:

data have ;
  input lot pfailed ntested ;
  cards ;
1  .1 20
2  .2 20
3  .1 20
4  .2 20
5  .4 60
;

PROC SHEWHART will calculate pbar as a weighted mean of the proportions, giving lot 5 more weight than the other lots, and you get pbar=.26.

proc shewhart data=have ;
  pchart pfailed*lot/subgroupn=ntested dataunit=proportion;
run ;

My thought is to calculate pbar myself as the unweighted mean, and you get pbar=.2, and pass that value to PROC SHEWHART:

proc sql noprint;
  select mean(pfailed) into :pbar trimmed
  from have
  ;
quit ;

%put &=pbar ;

proc shewhart data=have ;
  pchart pfailed*lot/subgroupn=ntested dataunit=proportion p0=&pbar;
run ;

BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

Zard

You would need to compute pbar outside of PROC SHEWHART, the way you showed. The procedure doesn't have an option for computing a simple average of proportions.

View solution in original post

Zard

I don't know if I would consider the default pbar as a weighted average in PROC SHEWHART.

data have ;
input lot pfailed ntested;
nfailed=pfailed*ntested;
cards ;
1 .1 20
2 .2 20
3 .1 20
4 .2 20
5 .4 60
;
proc print;
sum pfailed ntested nfailed;
run;

pbar = 36/140=0.257, so just the average proportion failed.

If you want the control limits computed using the same subgroup sample size, you could do

proc shewhart data=have ;
pchart pfailed*lot/subgroupn=20 dataunit=proportion;
run ;

Now the control limits are constant because the subgroup sample sizes are constant. If you ignore the subgroup sample sizes, the varying control limits are not correct.

Quentin

It's weighted in the sense that a subgroup with larger number of items will contribute more to the estimate of pbar. At least that is my understanding of:

I see the proportion for each subgroup being weighted by the number of items in the subgroup. I would see an unweighted estimate of pbar as just the average of the proportions, i.e.: (p_1+...+p_N)/N.

I don't want to use subgroupN, because that would exclude lot 5 from contributing to the calculation of the control limits.

In my general SPC reading when they show examples with varying size to the subgroups, typically there is little variation, and I think the assumption is that the size of a subgroup is uninformative. In that setting, a larger sample size probably should get more weight, because it provides a better estimate.

But for my example, the data collection process typically samples ~20 items. If they find evidence of an increased rate for a subgroup, they sample 40 more items from that subgroup. So you have some subgroups with 3x the size of other subgroups, and typically they are also the subgroups with an unusual value for the proportion (uncontrolled process). I don't want to give these subgroups more weight than the others in calculating pbar.

BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

Zard

If you were computing the average percent, you would sum the percentages and divide by N to get (.1+.2+.1+.2+.4)/5=0.2 . But here we are averaging proportions and the sample size should technically not be ignored.

> I don't want to use subgroupN, because that would exclude lot 5 from contributing to the calculation of the control limits.

The SUBGROUPN option specifies the n_i values to use in computing pbar. This allows for the case you describe, where you want all proportions to have equal weight, in that sense. Lot 5 is not excluded by SUBBGOUPN=20. Its contribution in the numerator becomes 20.4 instead of 60.4, and it gives you the pbar you want. This is how PROC SHEWHART lets you specify the relative contribution of each proportion.

proc shewhart data=have ;
pchart pfailed*lot/subgroupn=20 dataunit=proportion;
run ;

Quentin

Sorry, I confused your use of SUBGROUPN for LIMITN.

Yes, I could use SUBGROUPN to tell PROC SHEWHART that there are 20 items in each subgroup, and that would calculated pbar=.2 as I would like.

But of the course then the process limits are 'wrong' for subgroups with more than (or less than) 20 items. So in my example, lot 5 should have tighter control limits because of the large sample size. And that's what you get from SHEWHART when you tell it the sample size for each subgroup. I don't want to change that behavior. I just want the estimate of the process mean to be the simple mean of the group subgroup proportions, rather than a weighted mean.

BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

Zard

You would need to compute pbar outside of PROC SHEWHART, the way you showed. The procedure doesn't have an option for computing a simple average of proportions.

Quentin

Thanks much @Zard for helping me think this through. I appreciate your taking the time to share examples of different approaches.

BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

I don't know if I would consider the default pbar as a weighted average in PROC SHEWHART.

data have ; input lot pfailed ntested; nfailed=pfailed*ntested;cards ;1 .1 202 .2 203 .1 204 .2 205 .4 60;proc print; sum pfailed ntested nfailed;run;

pbar = 36/140=0.257, so just the average proportion failed.

If you want the control limits computed using the same subgroup sample size, you could do

proc shewhart data=have ; pchart pfailed*lot/subgroupn=20 dataunit=proportion;run ;

If you were computing the average percent, you would sum the percentages and divide by N to get (.1+.2+.1+.2+.4)/5=0.2 . But here we are averaging proportions and the sample size should technically not be ignored.

> I don't want to use subgroupN, because that would exclude lot 5 from contributing to the calculation of the control limits.

proc shewhart data=have ; pchart pfailed*lot/subgroupn=20 dataunit=proportion;run ;

data have ;
input lot pfailed ntested;
nfailed=pfailed*ntested;
cards ;
1 .1 20
2 .2 20
3 .1 20
4 .2 20
5 .4 60
;
proc print;
sum pfailed ntested nfailed;
run;

proc shewhart data=have ;
pchart pfailed*lot/subgroupn=20 dataunit=proportion;
run ;

proc shewhart data=have ;
pchart pfailed*lot/subgroupn=20 dataunit=proportion;
run ;