Justification of T- and F-Test Evaluation for Biased Net Fill Weight as Part of In-Process Control Data During the Capsule Filling Operation

Published on: 

This article provides a statistically defensible and GMP-aligned justification for continued IPC reliance on computed net fill weight when equivalence in mean and variance can be demonstrated.

Peer-Reviewed

Submitted: January 8, 2026
Accepted: March 4, 2026

Abstract

Net fill weight in capsule in-process control (IPC) is often calculated by subtracting a fixed average shell weight from gross weight. Because individual shell weights vary, this computation introduces bias and artificial variance. This work justifies using T- and F-tests to evaluate whether the computed (biased) net fill weight remains acceptable for IPC decision-making during capsule filling operations.

A simulation-based approach was applied to generate empirical acceptance limits for T (mean difference) and F (variance ratio) statistics by modeling fill-weight and shell-weight distributions. These limits replace conventional critical values and reflect practical process variability. The method enables determination of equivalence between idealized and computed net fill weights in terms of mean and variance and provides a GMP-aligned statistical justification for continued IPC use of computed net weight measurements.

In capsule filling operations, in-process control (IPC) of net fill weight is a critical element for ensuring dosage accuracy and maintaining compliance with good manufacturing practice (GMP) requirements. In routine manufacturing, net fill weight is commonly computed by subtracting a fixed average empty-capsule shell weight from the measured gross capsule weight. While operationally convenient, this approach assumes uniform shell weight and does not account for inherent shell-to-shell variability.

In practice, individual capsule shells exhibit measurable weight variation. When a fixed shell-weight average is applied to all capsules, this variability introduces bias and artificial dispersion into IPC net fill-weight data. As illustrated in Figures 1a and 1b, shell-weight variability alone can widen the observed net fill-weight distribution and reduce calculated process capability indices (CpK), even when the true fill-weight process remains unchanged. This effect complicates interpretation of IPC trends and may lead to conservative or misleading conclusions regarding process control.

From a statistical perspective, the impact of shell-weight variability raises a fundamental question: can the computed (biased) net fill weight still be considered equivalent to the idealized net fill weight for IPC purposes? In pharmaceutical process validation (PV) and continued process verification (CPV), equivalence is often assessed in terms of central tendency (mean) and dispersion (variance) rather than point-by-point agreement. Classical statistical tools for this purpose are the T-test, which evaluates equality of means, and the F-test, which evaluates equality of variances. These tests have been widely applied in pharmaceutical validation contexts to justify interchangeability of analytical or process-related datasets, provided that appropriate acceptance criteria are established.1

However, direct application of theoretical critical values for T- and F-tests may not adequately reflect realistic manufacturing variability, sampling uncertainty, or combined decision risk when both tests are applied simultaneously. Prior work by the author has demonstrated that simulation-based approaches can be used to derive empirical acceptance limits for T- and F-statistics that better represent practical process behavior and joint confidence requirements in PV and CPV applications.

Accordingly, this study focuses on justifying the use of T- and F-test–based equivalence evaluation for biased net fill-weight data used in IPC during capsule filling operations. A simulation-based framework is employed to establish empirical acceptance limits for both statistics, enabling equivalence decisions to be made on the basis of realistic process and measurement variability rather than solely theoretical assumptions.

The objective of this article is to provide such a statistically defensible and GMP-aligned justification for continued IPC reliance on computed net fill weight when equivalence in mean and variance can be demonstrated. In practice, samples evaluated need to be collected separately during PV and/or CPV activities but not taken from routine IPC trending data. For each sample set (eg, n = 30 during PV or n = 10-20 during CPV), gross capsule weight was measured and the net fill weight was computed by subtracting a fixed average empty capsule shell weight. The computed net fill weights were then compared with reference net fill weights obtained from direct weighing of the capsule contents for the same units.

Effect of capsule shell-weight variability on observed net fill-weight distributions. Figure 1 summarizes the effect of empty capsule shell-weight variability on the observed distribution of net fill weights under identical mean fill conditions. In all cases, the mean net fill weight is centered at 100% of target, indicating that shell-weight variability does not introduce a systematic shift in central tendency. However, the presence of shell-weight variability results in a clear widening of the observed net fill-weight distribution.

Figure 1a presents Monte Carlo–simulated normal distributions for idealized and computed net fill weights. When shell-weight variability is absent (idealized condition), the simulated net fill-weight distribution exhibits a standard deviation (SD) of 2.38% of target. Introduction of shell-weight variability through computation using a fixed average shell weight increases the SD to 2.56% of target. This variance inflation occurs despite identical mean fill weight and identical underlying fill-weight process assumptions.

Figure 1b illustrates the same variance inflation effect using idealized normal distributions. In this conceptual comparison, the only difference between distributions is the magnitude of the standard deviation, which increases from 2.38% to 2.56% of target when shell-weight variability is incorporated. This idealized illustration confirms that the observed widening of the distribution arises solely from shell-weight variability and not from changes in the fill-weight process itself. This variance inflation is accompanied by a reduction in calculated process capability, with CpK decreasing from 1.05 under idealized conditions to 0.98 under computed conditions.

Collectively, Figure 1 demonstrates that shell-weight variability alone can inflate observed net fill-weight variance and adversely affect IPC-based capability metrics, even when the true fill-weight process remains unchanged.

Supplementary Background for Figures 1a and 1b

The shell-weight variability range observed in Figures 1a and 1b arises from the outcome of Monte Carlo simulation (N = 2.4 million). Quality unit data indicated that capsule size #2 (used for a nominal fill weight of 315 mg per capsule) has a nominal empty-shell weight of approximately 62 mg with a SD of 2.99 mg. After normalization, a simulation with mean zero and SD of 2.99 mg was performed. The resulting minimum and maximum simulated deviations were −15.13 mg and +16.76 mg, corresponding to −4.80% and +5.32% of target fill weight, respectively, as reported in Figure 1a.

Using the Extreme Value (EV) approximation method,2 expected extrema of a normally distributed sample were estimated as:

Advertisement

where ( n = 2.4x10^6). With m = 0 and s = 2.99 mg, this yields expected extreme deviations of approximately ±16.21 mg, corresponding to ±5.15% of target fill weight. The close agreement between the observed simulation outcomes and the theoretical approximation supports the robustness of the simulated shell-weight variability range.

Methodology for establishing F- and T- test limits

Tables 1 and 2, adapted from reference 1, summarize the definitions, statistical formulae, and evaluation conditions (eg, sample sizes, paired datasets, and acceptance limits) for the F- and T-tests applied therein. In the present study, a Monte Carlo simulation framework (n = 30) was employed to generate biased net fill-weight data by incorporating a controlled variability component, with the target fill weight normalized to 100% of nominal.

Figures 2a and 2b demonstrate that theoretical one-sided critical limits for F- and T-tests closely match empirically observed limits obtained by Monte Carlo simulation, thereby validating the use of joint-confidence–adjusted theoretical thresholds for PV and CPV equivalence assessments. Table 3 demonstrates the critical F and T values for various sample sizes.

Simulation Study for Justification of 95% Joint Confidence Interval (95% JCI)

The justification for applying a 95% joint confidence interval (95% JCI) rather than a conventional 95% confidence interval (95% CI) was evaluated using a Monte Carlo simulation study. A custom simulation program implemented in Microsoft Excel (Visual Basic for Applications, VBA) was used to generate paired datasets (n₁ = n₂ = 30) drawn simultaneously from normal populations with identical mean (μ = 100) and standard deviation (σ = 5). These values were selected for illustrative purposes; alternative parameter values lead to equivalent conclusions when appropriately justified.

For each Monte Carlo iteration, both F- and T-statistics were computed from the paired datasets. A total of 2.4 million iterations were generated to characterize the empirical distributions of the F- and T-statistics. The simulated distributions were compared with their corresponding theoretical distributions, as illustrated in Figures 2a and 2b, using acceptance limits initially defined according to the conventional 95% confidence interval.

Under independent application of conventional 95% CI limits to both F- and T-tests, a separate simulation study showed that the probability of simultaneous acceptance of both tests is approximately 90.24% (mean of 90.23%, 90.22%, and 90.28% across three independent simulations of 150,000 iterations each), in close agreement with the theoretical joint probability of 90.25%. This result demonstrates that independent 95% CI application does not preserve an overall 95% confidence level when multiple statistical criteria are applied concurrently.

To restore an overall joint confidence level of 95%, adjusted one-sided critical limits for the F- and T-tests were derived, as summarized in Figures 2a and 2b and Tables 2 and 3. When these joint-confidence–adjusted limits were applied, the simulated probability of simultaneous acceptance increased to 94.98% (mean of 94.94%, 95.06%, and 94.95% across three independent simulations of 150,000 iterations each), confirming close agreement between empirical simulation results and theoretical expectations.

Figure 3 illustrates simulated F- and T-statistic values generated under conditions of n₁ = n₂ = 20, with target fill weight μ = 315 mg and process standard deviation σ = 3.5 mg, representing a worst-case scenario relative to the σ = 2.99 mg condition discussed above. The simulated F and T values correspond to 20 hypothetical routine batches evaluated under CPV-like sampling conditions. One simulated lot exhibits an F value (3.35) exceeding the joint-confidence–adjusted limit of 2.89, representing an isolated variance excursion in the absence of a corresponding mean shift. Figure 3 demonstrates that occasional variance-based limit exceedances are statistically expected under CPV sampling conditions and should be interpreted within a joint mean–variance framework rather than as isolated failures.

Implications for IPC Interpretation and Process Capability Assessment

When net fill weight is computed using a fixed average capsule shell weight, inherent shell-to-shell variability is propagated into IPC data, inflating the observed variance without altering the underlying fill-weight process. This variance inflation can reduce calculated CpK values and complicate interpretation of control charts and trends, as illustrated in Figure 1a. Importantly, this effect represents a measurement-induced artifact rather than a loss of filling accuracy and, if unrecognized, may lead to unnecessary investigations or overly conservative process decisions.

Justification for Simulation-based Statistical Evaluation

The results demonstrate that direct application of theoretical critical values for T- and F-tests may not adequately reflect IPC data derived from computed net fill weights. Shell-weight variability can inflate variance-related statistics without affecting the mean (Figure 1a), such that reliance on theoretical F-test limits alone may result in over-rejection of otherwise acceptable processes. Likewise, interpretation of T-test outcomes may be incomplete when mean and variance are evaluated jointly. A simulation-based framework enables empirical acceptance limits for T- and F-statistics to be established under realistic process and measurement assumptions, supporting statistically justified equivalence decisions in PV and CPV.

Limitations and Further Considerations

This analysis focuses on capsule shell-weight variability as a primary source of variance inflation; other measurement-related factors—including balance resolution, environmental influences, and sampling strategy—may similarly affect IPC data and warrant further investigation. The proposed approach does not replace regulatory acceptance criteria but provides a statistically justified framework for interpreting IPC data within existing GMP requirements. Integration of simulation-based acceptance limits with established control strategies may enhance robustness while maintaining regulatory compliance.

Interpretation of Critical F- and T-Values Across Sample Sizes

Table 3 summarizes theoretical critical values for the F-test and T-test across sample sizes commonly encountered in IPC applications. For each sample size (n), degrees of freedom are defined as df₁ = df₂ = n − 1 for the F-test and df(T) = 2n − 2 for the two-sample T-test. The reported limits correspond to nominal 95% joint confidence requirements under classical assumptions.

As expected, both F- and T-critical values decrease with increasing sample size, reflecting reduced sampling uncertainty at larger n. For smaller sample sizes (eg, n = 10 – 20), relatively high F-critical values indicate greater tolerance for variance differences, whereas T-critical values remain comparatively stable. At larger sample sizes (eg, n ≥ 50), F-critical values approach unity, increasing sensitivity to variance inflation effects.

Importantly, these theoretical critical values are derived independently for each statistic and do not account for joint decision risk when T- and F-tests are applied simultaneously. As demonstrated in this study, shell-weight variability can disproportionately affect variance-related statistics—particularly at larger sample sizes—thereby increasing the likelihood of F-test rejection even when mean equivalence is maintained. This observation reinforces the need for simulation-based derivation of empirical acceptance limits that reflect realistic process and measurement variability, especially in PV and CPV contexts.

Notably, the statistical evaluation is based on dedicated PV and CPV sample sets designed for paired comparison of computed and reference net fill weights, rather than routine IPC trending data, reinforcing the relevance and validity of the observed variance effects under controlled manufacturing conditions.

Conclusion

Capsule filling operations frequently rely on computed net fill weights, obtained by subtracting a fixed average empty-capsule shell weight from the measured gross capsule weight. The present study demonstrates that this computational approach can introduce a measurable bias and propagate inherent shell-weight variability directly into the observed net fill-weight distribution. As a result, the variance of the biased dataset is artificially inflated, even when the underlying fill-weight process remains accurate, stable, and unchanged.

The analysis confirms that such variance inflation can reduce calculated process capability indices and complicate interpretation of in-process control data if not properly recognized. Monte Carlo simulation and idealized distribution comparisons show that the observed dispersion effect arises from the measurement and calculation model itself, rather than from process instability or loss of dosage accuracy.

Accordingly, this article provides explicit justification for using T- and F-tests as structured evaluation tools for biased net fill-weight data when dedicated PV and CPV samples are employed. When both tests are applied simultaneously, equivalence in mean and variance can be objectively assessed, supporting the statistical interchangeability of idealized and biased net fill-weight datasets. A simulation-based statistical framework offers a rational basis for establishing empirical acceptance limits under realistic manufacturing conditions, thereby enabling GMP-compliant process performance monitoring and reliable IPC-based decision-making during routine capsule filling operations.

References

  1. Cholayudth P. Full tolerance coverage method for assessing uniformity of dosage units with large sample sizes. Pharmaceutical Technology 2025;49 (2). https://www.pharmtech.com/view/full-tolerance-coverage-method-for-assessing-uniformity-of-dosage-units-with-large-sample-sizes
  2. Gumbel, E. J. Statistics of Extremes. Columbia University Press, New York, 1958.
  3. Sec. 1.3.5.9: F-test for equality of two variances, www.itl.nist.gov.
  4. Sec. 1.3.5.3: Two-sample T-test for equal means, www.itl.nist.gov.

About the Author

Pramote Cholayudth, cpramote2000@yahoo.com, is the founder and manager of PM Consult. He is an industrial pharmacist with more than 40 years of experience. He is a guest speaker on process validation to industrial pharmaceutical scientists organized by Thailand’s FDA.