Optimal Sampling of Populations
Optimal Sampling and Problematic Likelihood Functions in a Simple Population Model
Dan Pagendam, Phil Pollett
Markov chains provide excellent statistical models for studying many natural phenomena that evolve with time. One particular class of continuous-time Markov chain, called birth–death processes, can be used for modelling population dynamics in fields such as ecology and microbiology. The challenge for the practitioner when fitting these models is to take measurements of a population size over time in order to estimate the model parameters, such as per capita birth and death rates. In many biological contexts, it is impractical to follow the fate of each individual in a population continuously in time, so the researcher is often limited to a fixed number of measurements of population size over the duration of the study. We show that, for a simple birth–death process, with positive Malthusian growth rate, subject to common practical constraints, there is an optimal schedule for measuring the population size that minimises the expected confidence region of the parameter estimates. Throughout our exposition of the optimal experimental design, we compare it to a simpler equidistant design, where the population is sampled at regular intervals. This is an experimental design worthy of comparison since it can represent a much simpler design to implement in practice. In order to find optimal experimental designs for our population model, we make use of a combination of useful statistical machinery. Firstly, we use a Gaussian diffusion approximation of the underlying discrete-state Markov process, which allows us to obtain analytical expressions for Fisher’s information matrix (FIM), which is crucial to optimising the experimental design. We also make use of the cross-entropy method of stochastic optimisation for the purpose of maximising the determinant of FIM to obtain the optimal experimental designs. Our results show that the optimal schedule devised by others for a simple model of population growth without death can be extended, for large populations, to the two-parameter model that incorporates both birth and death. For the simple birth–death process, we find that the likelihood surface is also problematic and poses serious problems for point estimation and easily defining confidence regions. A Bayesian approach to inference is proposed as a way in which these problems could be circumvented.