Many data sets (real and simulated) have been examined using the first-order (FO) estimation method and, more recently, the conditional estimation methods. With many population pharmacokinetic data sets, the FO method works fairly well. It requires far less CPU time than does a conditional estimation method. However, from the time of its earliest usage there has been a small number of examples where the method has not worked adequately. Evidence suggesting that the method may not be adequate with a particular data set can be readily obtained with the goodness-of-fit scatterplot: with mean-variance intraindividual models, a plot of observations versus (population) predictions. Consider two such scatterplots in Figures 1 and 2. The one in Figure 1, resulting from use of the FO method, shows a clear bias in the fit. The data result from single oral bolus doses being given to a number of subjects; the data are modeled with a two compartment linear model with first-order absorption from a drug depot. The scatterplot in Figure 2 results from use of the FOCE method without interaction. Much of the bias is eliminated with the use of this method. In this situation, the benefit from the extra expenditure of computer time that is needed with the method is substantial.
The Laplacian method can use considerably more computer time than the FOCE method, depending on the complexity of the computations for obtaining needed second derivatives. In this example, the extra expenditure of computer time needed with the Laplacian method is not much, but the benefit is also not much. The scatterplot resulting from using the Laplacian method is very similar to that of Figure 2.
The Laplacian method should perform no worse than
the FOCE method (the former avoids the first-order
approximation). The FOCE method should perform no worse than
the FO method (the adaquacy of the first-order approximation
is better when the are
evaluated at the conditional estimates, rather than at 0).
Similarly, the hybrid method should also perform no worse
than the FO method, but perhaps not as well as the FOCE
method. (See e.g. Figure 3, which is the goodness-of-fit
plot for the same data described above, using the hybrid
method (with two out of four
’s "zeroed".)) This defines a type of
hierarchy to the methods.
The need to proceed up the hierarchy from the FO
method increases as the degree to which the intraindividual
models are nonlinear in the
increases. The need to use the Laplacian method increases
because as the degree of nonlinearity increases, the
adequacy of the first-order approximation decreases. The
need to use the FOCE method increases because as the degree
of nonlinearity increases, the adequacy of the first-order
approximation depends more on the values at which the
are evaluated.
Population (structurally) linear pharmacokinetic models are often rather linear (as just defined), although the degree of nonlinearity increases with the degree of multiple dosing. With these models the Laplacian method is rarely, if ever, needed. With simple bolus dosing, the FOCE method is often not needed, although the example cited above serves as a reminder not to interpret this last assertion too optimistically. On the other hand, population nonlinear pharmacokinetic models (e.g. models with Michaelis-Menten elimination) can be quite nonlinear. Population pharmacodynamic models also can be quite nonlinear, and especially with models for categorical- and discrete-ordinal-type observations, the Laplacian method is invariably the best choice.
The ability of a conditional estimation method to
produce results different from those obtained with the FO
method decreases as the degree of random interindividual
variablity, i.e. "the size" of
decreases. This is because
the conditional methods use conditional estimates of the
, which are all shrunken to
0, and the shrinkage is greater the smaller the size of
. The value 0 is the value
used for the
with the FO
method. Similarly, the ability of FOCE to produce results
different from those obtained with the hybrid method
decreases as the degree of random interindividual
variablity, i.e. "the size" of
decreases. In fact, suppose
one tries to use the FOCE method and finds that some
estimates of interindividual variances are rather large
compared to others. Then using the hybrid method where those
elements of
with small
variance are "zeroed", may well result in a fit
about as good as that using FOCE (in contrast to that shown
in Figure 3), and if the number of elements of
that are zeroed is large
relative to the total number of elements, CPU time may be
significantly reduced.
The ability of a conditional estimation method to
produce results different from those obtained with the FO
method decreases as the amount of data per individual
decreases. This is because the conditional methods use
conditional estimates of the
, which are all shrunken to
0, and the shrinkage is greater the less the amount of data
per individual. Actually, the amount of data from the ith
individual should be measured relative to the number of
parameters in the model for the individual, i.e. the number
of elements of
upon which
the model really depends. As the number of parameters
increases, the amount of data decreases, and can
"approach 0". Also, strictly speaking, the amount
of data might be understood as being relative to the
"data design" (the poorer the design, the less
useful the data) and the magnitude of intraindividual error
(the more error, the less useful the data).
With intraindividual mean-variance models where
it may appear theoretically plausible that there is an
-interaction, it might seem
more appropriate to use the FOCE method with interaction
than to use the FOCE method without interaction. However,
when the amount of (true) intraindividual variance is large
(though the intraindividual models may be structurally
well-specified), or the amount of data per individual is
small, it will be difficult for the data to support an
interaction, in which case
the FOCE method with interaction may produce no improvement
over the FOCE method without interaction. Otherwise, and
especially when intraindividual variance is small for some
observations, but not for others due to structural model
misspecification, and when there is considerable
interindividual variability, the FOCE method with no
interaction can lead to a noticeably biased fit (as can the
FO method).
There seems to be no consistent relationship between the value of the objective function using one method and the value of the objective function using another method. Therefore, objective function values should not be compared across methods. However, objective function values (in conjunction with graphical output) can provide a very useful way to compare alternative models, as long as the values are obtained using the same method.
Unless interindividual variability is small, use of a random interindividual effect in the model should be such that quantities that depend on the effect are always computed with physically meaningful values. For example, rather than model a clearance parameter by
it is better to use
since clearance should always be positive. With
the FO method, use of either model produces essentially the
same results. (The formulas for clearance and for the
derivatives of clearance with respect to
are computed only with the
value
.) However, with a
conditional estimation method, different values of
are tried. A negative value
for
can result with the
first model, especially when
is large and large negative
values of
are
tried.
To take another example: Suppose that with the
one compartment linear model with first-order absorption
from a drug depot, it is assumed that pharmacokinetically,
for all individuals, the rate constant of absorption exceeds
the rate constant of elimination, i.e.
. Instead of
one should use
and constrain both
and
to be positive.
Again, with the FO method, use of either model produces
essentially the same
results.
The problem with the first model is that when using a
conditional estimation
method, as and
vary,
the value of can exceed
,
due to "flip-flop".
As this can happen, or not, from one individual to the next,
if it
happens at all, the conditional estimation method will
"become confused"
and fail. The conditional estimation method by itself has no
way of knowing
that it has been assumed that
will not exceed
, and it
cannot distinguish flip-flop from this possibility. (If
pharmacokinetically,
occurs, again the conditional estimation method will become
confused, not
being able to distinguish flip-flop from these
possibilities, but in this case,
a modification of the model will not help.)
Consider again the simple model for a clearance parameter,
With the FO method, all derivatives with respect
to are evaluated at 0.
Consequently, in effect, a transformed model for
is used: a first-order
approximation in
, of the
right side of the equation,
This is a constant cv type model. With the FO
method, no matter whether the given model or the transformed
model is "used", the results of the analysis will
be the same. The same is true even if covariates are
involved. However, when a population conditional estimation
method is used, the results of the analysis will differ
between the two models, as derivatives with respect to
are evaluated at
conditional estimates.
The following general guidelines are offered so that conditional estimation methods are used only when necessary, and thus unnecessary expenditure of computer time and other difficulties that sometimes arise with conditional estimation methods (see section D) are avoided. They are based on impressions, rather than systematic study. Clearly, there will arise situations where alternative approaches might be tried.
If the model is of a very nonlinear kind (see section A), then from the outset, a conditional method might be used instead of the FO method. Indeed, with models for categorical- and discrete-ordinal type observations, the Laplacian method should always be used, and the remainder of this discussion concerns the use of conditional estimation methods with models for continuous outcomes (more precisely, the intraindividual models are of mean-variance type).
When analyzing a new data set and/or using a very new model with the data set, it is a good practice to use the FO method with at least the first one or two NONMEM runs, in order to simply check the data set and control stream. The Estimation Steps with these runs should terminate successfully, although if a conditional estimation method is really needed, the results themselves may not be entirely satisfactory. At this very early stage of data analysis, the user needs to be able to detect elementary errors that may have been introduced into the data set or control stream, and to be able to detect significant modeling difficulties. This cannot be done easily if other unrelated problems that can arise with conditional estimation methods interfere.
One might do well to begin to develop a complete model, incorporating the covariates, etc., using the FO method. Decisions regarding the effects of covariates on randomly dispersed parameters are aided by examining scatterplots of conditional estimates versus particular covariates. When the FO method is used, the posthoc estimates are the conditional estimates that are used for this purpose. After it appears that the model can be developed no further, there nonetheless exists appreciable bias in the final fit, think about how this bias might be well-explained by model misspecification that has not been possible to address (e.g. there is a known covariate effect, and the covariate has not been measured). The use of an estimation method cannot really compensate for bias due to model misspecification, and one should not imagine that a conditional estimation method is any different.
After model development is complete using the FO method, if there seems to be no bias in the fit, you might simply want to do one run with FOCE to check this impression. If after this, the fit does not significantly improve, you can stop. After model development is complete using the FO method, if there seems to be no bias in the fit, consider doing one run with FOCE to obtain the best possible estimates of variance-covariance components. The variance-covariance components are often estimated better using FOCE (but realize that sometimes, they may be estimated very similarly by FO - see discussion in section A), and when these estimates are important to you, it can therefore be worthwhile investing the time needed with the additional FOCE run. It is not necessary to use FOCE to sharpen the estimates of variance-covariance components until after an adequate model is developed using the FO method.
After model development is complete using the FO
method, if appreciable unexplainable bias remains, do try
using FOCE. Indeed, do not hesitate to try FOCE before model
development is complete when a number of initial
concientious attempts to improve your model using FO have
resulted in considerable bias, and when conditions
are such that a priori, the FO and FOCE results are
not expected to be very similar (see background section).
When the intraindividual models you are using permit the
possibility of an
-interaction that the data may be rich enough to support,
try FOCE with interaction. If the use of FOCE significantly
reduces the bias, continue to develop the model using FOCE.
Or, before embarking on continued model development, first
experiment with the hybrid method to see whether this
produces as much bias reduction as does FOCE, along with
significant improvement in run time over FOCE. Continued
model development may entail repeating much of the work
already done with the FO method. In particular, try adding
covariates rejected when using the FO method, and reconsider
alternative ways that the covariates already accepted can
enter the model. As a result of the cost involved in
possibly needing to repeat work already undertaken with the
FO method, the question of how soon one begins to try FOCE
is not clearly answerable. Surely, increased computational
times must be considered, and usually one wants to delay
using a conditional estimation method until use of such a
method seems to be clearly indicated.
The model might be very nonlinear, in which case try the Laplacian method. If after using the FOCE and Laplacian conditional estimation methods, an appropriate goodness-of-fit plot is unsatisfactory, then there is very likely a modeling difficulty, and one must seriously acknowledge this.
If after concientious modeling using the appropriate (noncentering) conditional estimation method(s), a model results with which substantial bias still appears in the fit, there is probably a model-related explanation for this, though it may allude one. In these circumstances, one may want to proceed to obtain the best possible fit with the model in hand. The fit that has been obtained using the noncentering conditional estimation method is not necessarily the best fit that may be obtained with the misspecified model.
The bias may be reflected by an uncentered fit.
When a population conditional estimation method is used, the
average conditional estimate for each element of
is given in NONMEM output
(the conditional estimates being averaged are those produced
by the method), along with a P-value that can be used to
help assess whether this average is sufficiently close to 0
(the null hypothesis). The occurence of at least one small
P-value (less than 0.05, though when the P-value is small,
it can be much less than 0.05) indicates an uncentered
fit.
A centering method might be tried. Using centering FOCE or centering Laplacian, one should notice that the P-values are somewhat larger (although perhaps some are still less than 0.05), and often one will also notice considerable improvement in the fit to the data themselves. When it is necessary to use a centering method, the population parameter estimates (at least those identified with the misspecified part of the model) are themselves of little interest; population predictions under the fitted model are what is of interest. Also, because the model is misspecified, one should anticipate possible problems with model validation and model applications involving extrapolation.
Although it may be that (at least in certain specifiable situations) fits with centering methods are in general no worse than those obtained with appropriate noncentering methods, this idea is not yet well enough tested. Moreover, routine use of centering methods will mask modeling problems. Centering methods should be used only when, after concientious modeling, bias in fit seems unavoidable. CENTERING METHODS SHOULD NOT BE ROUTINELY USED. When the model is well-specified, it seems unlikely that when using an appropriate noncentering method, bias in fit will result, and there should be no expectation that any further improvement can be gained with a centering method.
Even when the fit is centered, it may be possible
(though rare) that the fit to the data themselves still
shows bias (see remarks in chapter II). One might then also
use centering FOCE with the first-order model, subject to
the same cautions given above. (Recall that in this case,
the conditional estimates of the
resulting from the
centering method are based on linear intraindividual models.
When centering is actually needed, these conditional
estimates should probably be adequate for whatever purposes
conditional estimates might be used. It is possible
nonetheless to obtain posthoc estimates based on the given
intraindividual models.)
Even when a model is well-specified, it may be so complicated (e.g. it uses difficult differential equations) that to use it with a conditional estimation method requires a great amount of computer time. In this case, if indeed a conditional estimation method is needed, one might use centering FOCE with the first-order model, even though centering per se may not be needed. In this situation, use of centering, along with the first-order model, is just a device allowing a conditional estimation method to be implemented with less of a computational burden. A compromise is achieved; the fit should appear to be an improvement over that obtained with the FO fit, but it may not be quite as good as one obtained with the noncentering FOCE or Laplacian methods. Because the first-order model is automatically obtained from the given model, the final form of the given model (obtained after completing model development) is readily available, and with this model, one might try to implement one run using either the noncentering FOCE or Laplacian methods and compare results.
As already noted in section A, use of the hybrid method may require appreciably less computer time than the FOCE method and yet result in as good a fit. There is another important use of this method.
A change-point parameter of the ith
intraindividual model is a parameter of the model such that
for any value of , the
derivative of
with respect
to this parameter, evaluated at some value of the parameter
(a change-point value , is undefined. An example of
this is an absorption lagtime parameter A of a
pharmacokinetic model for blood concentrations
. If a dose is given at
time 0, then the derivative of the pharmacokinetic
expression for
at time
with respect to A,
evaluated at
is undefined.
So if moreover, an observation
occurs at time
(so that the expression for
must be evaluated at this
time), then the derivative of
evaluated at
is undefined (for any value
of
or for any of the other
observations of
).
Therefore under these circumstances, if the change-point
parameter is randomly dispersed, and
may assume a value at which
, then
is undefined at this value,
and, strictly speaking, all estimation methods described in
chapter II are undefined. But practically speaking, a method
will fail only when, during the search to minimize
, a value of
at which
cannot be avoided. A
symptom that this is happening, when there is a randomly
dispersed change-point parameter, is a search terminization
with a large gradient, i.e. some gradient elements are
or larger. Often, a lagtime
is estimated to be very near the time of the first
observation within an individual record, and so the problem
described here can be a very real problem. One remedy is to
delete observations at times that are too close to estimated
lag times. However, aside from entailing the deletion of
legitimate data, there can also be implementation problems
with this strategy.
If the hybrid method is used, and the element(s)
of associated with the
change-point parameter - denote this by
- is zeroed, this reduces
the number (across individuals) of values
at which
could possibly be undefined
in the computation, as only the value of the change-point
parameter for the typical subject is needed in the
computation. Indeed, unless the change-point parameter
itself depends on a covariate, only at the value
can
possibly be undefined in
the computation. Thus, the chance of the problem occurring
is reduced (but not eliminated).†
----------
A conditional estimation method can demonstrate
somewhat more sensitivity to rounding error problems during
the Estmation Step than will the FO method. When the search
for parameter estimates terminates with rounding error
problems, oftentimes intermediate output from the Estimation
Step will indicate the accuracy with which each of the final
parameter estimates has been obtained. For example, 3
significant digits may be requested for each estimate, but
for some estimates, less than 3 digits is actually obtained.
If only a little less than 3 digits is obtained (e.g.
2.7-2.9), and if the gradient vector of the objective
function with the final parameter estimates is small (e.g.
no element is greater than 5 in absolute value), then this
degree of accuracy is probably acceptable. If much less
accuracy is obtained, but only with those estimates where
this might be expected and where this is tolerable (e.g.
estimates of elements),
then again, one might regard the Estimation Step as having
terminated successfully. (The order of the parameter
estimates printed in the iteration summaries is: the
’s in their
subscripted order, followed by the (unconstrained)
elements, followed by the
(unconstrained)
elements.
Note though, that these estimates are those of the scaled
transformed parameters (STP), rather than the original
parameters; see NONMEM Users Guide - Part I, section
C.3.5.1.)
With a conditional estimation method (in contrast
with the FO method), NONMEM can more readily terminate
during the Estimation Step with a PRED error message
indicating e.g. that a nonallowable value for a parameter
has been computed in PRED code, perhaps a negative value for
a rate constant.†† This is because a parameter
may be randomly dispersed, and with a conditional estimation
method, values of different
from 0 are tried, as well as are different values of
, and some of these values
might result in a nonallowable value of the parameter. If
such a termination occurs, then, if not already doing so,
consider modeling the parameter in a way that prevents it
from assuming a nonallowable value, e.g if the parameter
cannot be negative, consider using a model such as
(see section B). Sometimes
this cannot completely solve the problem, e.g. if the
parameter cannot also be 0, the model just given will not
insure this (
can be very
large and negative). So, a termination may still occur. The
next step is to try to include the NOABORT option
on the $ESTIMATION record (see NONMEM Users Guide - Part IV,
section IV.G.2). However, doing so will have no effect if
the termination occurs during the 0th
iteration.††† The NOABORT
option activates one type of PRED error-recovery
(THETA-recovery), and the other type (ETA-recovery) is
always activated, without using the option. So the option
may not need to be used initially, and if PREDPP is being
used, to have used the option before a termination has
actually occured has the detrimental effect that this can
mask the occurrence of an error detected by PREDPP, of which
the user needs to be informed. With PREDPP, never use
the NOABORT option until you have had an
opportunity (i) to see what happens when you do not use it,
i.e. to see the contents of PRED error messages that might
arise when you do not use the option, (ii) to respond, if
possible, to these messages in a sensible way (other than
using the option), and (iii) to see what happens after you
have done this.
----------
Perhaps the operating system, rather than NONMEM,
terminates the program with a message indicating the
occurrence of a floating point exception in a user-code.
Again, this may be because a value
is tried which results in
the exception when a value of a randomly dispersed parameter
is computed. Underflows are ignorable, and terminations due
to underflows should be disabled (see NONMEM Users Guide -
Part III). With an operand error, or overflow, or
zero-divide, the user needs to identify where the exception
occurs in the code. For this purpose, the use of a debugger,
or debugging print statements in the code, may be helpful.
Then perhaps the exception may be avoided by using PRED
error-recovery in the user-code, i.e. by using the EXIT
statement with return code 1 (see NONMEM Users Guide - Part
IV, section IV.G.2). Try this, and rerun. If with the
earlier run, the termination occured after the 0th
iteration, and if PREDPP is not being used, rerun the
problem using the NOABORT option on the $ESTIMATION
record. If the termination occured after the 0th iteration,
and if PREDPP is being used, rerun, but do not use
the NOABORT option. If the termination
still occurs, then rerun a second time, this time using the
NOABORT option. If the termination occurs during
the 0th iteration, the NOABORT option has no
effect. Such a termination can arise due to a problem with
either the data set, user code, or control stream. Different
initial estimates might be tried (perhaps smaller
interindividual variances).