Category: validation
Coarse-graining discrete cell cycle models
Introduction
One observation that often goes underappreciated in computational biology discussions is that a computational model is often a model of a model of a model of biology: that is, it’s a numerical approximation (a model) of a mathematical model of an experimental model of a real-life biological system. Thus, there are three big places where a computational investigation can fall flat:
- The experimental model may be a bad choice for the disease or process (not our fault).
- Second, the mathematical model of the experimental system may have flawed assumptions (something we have to evaluate).
- The numerical implementation may have bugs or otherwise be mathematically inconsistent with the mathematical model.
Critically, you can’t use simulations to evaluate the experimental model or the mathematical model until you verify that the numerical implementation is consistent with the mathematical model, and that the numerical solution converges as \( \Delta t\) and \( \Delta x \) shrink to zero.
There are numerous ways to accomplish this, but ideally, it boils down to having some analytical solutions to the mathematical model, and comparing numerical solutions to these analytical or theoretical results. In this post, we’re going to walk through the math of analyzing a typical type of discrete cell cycle model.
Discrete model
Suppose we have a cell cycle model consisting of phases \(P_1, P_2, \ldots P_n \), where cells in the \(P_i\) phase progress to the \(P_{i+1}\) phase after a mean waiting time of \(T_i\), and cells leaving the \(P_n\) phase divide into two cells in the \(P_1\) phase. Assign each cell agent \(k\) a current phenotypic phase \( S_k(t) \). Suppose also that each phase \( i \) has a death rate \( d_i \), and that cells persist for on average \( T_\mathrm{A} \) time in the dead state before they are removed from the simulation.
The mean waiting times \( T_i \) are equivalent to transition rates \( r_i = 1 / T_i \) (Macklin et al. 2012). Moreover, for any time interval \( [t,t+\Delta t] \), both are equivalent to a transition probability of
\[ \mathrm{Prob}\Bigl( S_k(t+\Delta t) = P_{i+1} | S(t) = P_i \Bigr) = 1 – e^{ -r_i \Delta t } \approx r_i \Delta t = \frac{ \Delta t}{ T_i}. \] In many discrete models (especially cellular automaton models) with fixed step sizes \( \Delta t \), models are stated in terms of transition probabilities \( p_{i,i+1} \), which we see are equivalent to the work above with \( p_{i,i+1} = r_i \Delta t = \Delta t / T_i \), allowing us to tie mathematical model forms to biological, measurable parameters. We note that each \(T_i\) is the average duration of the \( P_i \) phase.
Concrete example: a Ki67 Model
Ki-67 is a nuclear protein that is expressed through much of the cell cycle, including S, G2, M, and part of G1 after division. It is used very commonly in pathology to assess proliferation, particularly in cancer. See the references and discussion in (Macklin et al. 2012). In Macklin et al. (2012), we came up with a discrete cell cycle model to match Ki-67 data (along with cleaved Caspase-3 stains for apoptotic cells). Let’s summarize the key parts here.
Each cell agent \(i\) has a phase \(S_i(t)\). Ki67- cells are quiescent (phase \(Q\), mean duration \( T_\mathrm{Q} \)), and they can enter the Ki67+ \(K_1\) phase (mean duration \(T_1\)). When \( K_1 \) cells leave their phase, they divide into two Ki67+ daughter cells in the \( K_2 \) phase with mean duration \( T_2 \). When cells exit \( K_2 \), they return to \( Q \). Cells in any phase can become apoptotic (enter the \( A \) phase with mean duration \( T_\mathrm{A} \)), with death rate \( r_\mathrm{A} \).
Coarse-graining to an ODE model
If each phase \(i\) has a death rate \(d_i\), if \( N_i(t) \) denotes the number of cells in the \( P_i \) phase at time \( t\), and if \( A(t) \) is the number of dead (apoptotic) cells at time \( t\), then on average, the number of cells in the \( P_i \) phase at the next time step is given by
\[ N_i(t+\Delta t) = N_i(t) + N_{i-1}(t) \cdot \left[ \textrm{prob. of } P_{i-1} \rightarrow P_i \textrm{ transition} \right] – N_i(t) \cdot \left[ \textrm{prob. of } P_{i} \rightarrow P_{i+1} \textrm{ transition} \right] \] \[ – N_i(t) \cdot \left[ \textrm{probability of death} \right] \] By the work above, this is:
\[ N_i(t+\Delta t) \approx N_i(t) + N_{i-1}(t) r_{i-1} \Delta t – N_i(t) r_i \Delta t – N_i(t) d_i \Delta t , \] or after shuffling terms and taking the limit as \( \Delta t \downarrow 0\), \[ \frac{d}{dt} N_i(t) = r_{i-1} N_{i-1}(t) – \left( r_i + d_i \right) N_i(t). \] Continuing this analysis, we obtain a linear system:
\[ \frac{d}{dt}{ \vec{N} } = \begin{bmatrix} -(r_1+d_1) & 0 & \cdots & 0 & 2r_n & 0 \\ r_1 & -(r_2+d_2) & 0 & \cdots & 0 & 0 \\ 0 & r_2 & -(r_3+d_3) & 0 & \cdots & 0 \\ & & \ddots & & \\0&\cdots&0 &r_{n-1} & -(r_n+d_n) & 0 \\ d_1 & d_2 & \cdots & d_{n-1} & d_n & -\frac{1}{T_\mathrm{A}} \end{bmatrix}\vec{N} = M \vec{N}, \] where \( \vec{N}(t) = [ N_1(t), N_2(t) , \ldots , N_n(t) , A(t) ] \).
For the Ki67 model above, let \(\vec{N} = [K_1, K_2, Q, A]\). Then the linear system is
\[ \frac{d}{dt} \vec{N} = \begin{bmatrix} -\left( \frac{1}{T_1} + r_\mathrm{A} \right) & 0 & \frac{1}{T_\mathrm{Q}} & 0 \\ \frac{2}{T_1} & -\left( \frac{1}{T_2} + r_\mathrm{A} \right) & 0 & 0 \\ 0 & \frac{1}{T_2} & -\left( \frac{1}{T_\mathrm{Q}} + r_\mathrm{A} \right) & 0 \\ r_\mathrm{A} & r_\mathrm{A} & r_\mathrm{A} & -\frac{1}{T_\mathrm{A}} \end{bmatrix} \vec{N} .\]
(If we had written \( \vec{N} = [Q, K_1, K_2 , A] \), then the matrix above would have matched the general form.)
Some theoretical results
If \( M\) has eigenvalues \( \lambda_1 , \ldots \lambda_{n+1} \) and corresponding eigenvectors \( \vec{v}_1, \ldots , \vec{v}_{n+1} \), then the general solution is given by
\[ \vec{N}(t) = \sum_{i=1}^{n+1} c_i e^{ \lambda_i t } \vec{v}_i ,\] and if the initial cell counts are given by \( \vec{N}(0) \) and we write \( \vec{c} = [c_1, \ldots c_{n+1} ] \), we can obtain the coefficients by solving \[ \vec{N}(0) = [ \vec{v}_1 | \cdots | \vec{v}_{n+1} ]\vec{c} .\] In many cases, it turns out that all but one of the eigenvalues (say \( \lambda \) with corresponding eigenvector \(\vec{v}\)) are negative. In this case, all the other components of the solution decay away, and for long times, we have \[ \vec{N}(t) \approx c e^{ \lambda t } \vec{v} .\] This is incredibly useful, because it says that over long times, the fraction of cells in the \( i^\textrm{th} \) phase is given by \[ v_{i} / \sum_{j=1}^{n+1} v_{j}. \]
Matlab implementation (with the Ki67 model)
First, let’s set some parameters, to make this a little easier and reusable.
parameters.dt = 0.1; % 6 min = 0.1 hours parameters.time_units = 'hour'; parameters.t_max = 3*24; % 3 days parameters.K1.duration = 13; parameters.K1.death_rate = 1.05e-3; parameters.K1.initial = 0; parameters.K2.duration = 2.5; parameters.K2.death_rate = 1.05e-3; parameters.K2.initial = 0; parameters.Q.duration = 74.35 ; parameters.Q.death_rate = 1.05e-3; parameters.Q.initial = 1000; parameters.A.duration = 8.6; parameters.A.initial = 0;
Next, we write a function to read in the parameter values, construct the matrix (and all the data structures), find eigenvalues and eigenvectors, and create the theoretical solution. It also finds the positive eigenvalue to determine the long-time values.
function solution = Ki67_exact( parameters ) % allocate memory for the main outputs solution.T = 0:parameters.dt:parameters.t_max; solution.K1 = zeros( 1 , length(solution.T)); solution.K2 = zeros( 1 , length(solution.T)); solution.K = zeros( 1 , length(solution.T)); solution.Q = zeros( 1 , length(solution.T)); solution.A = zeros( 1 , length(solution.T)); solution.Live = zeros( 1 , length(solution.T)); solution.Total = zeros( 1 , length(solution.T)); % allocate memory for cell fractions solution.AI = zeros(1,length(solution.T)); solution.KI1 = zeros(1,length(solution.T)); solution.KI2 = zeros(1,length(solution.T)); solution.KI = zeros(1,length(solution.T)); % get the main parameters T1 = parameters.K1.duration; r1A = parameters.K1.death_rate; T2 = parameters.K2.duration; r2A = parameters.K2.death_rate; TQ = parameters.Q.duration; rQA = parameters.Q.death_rate; TA = parameters.A.duration; % write out the mathematical model: % d[Populations]/dt = Operator*[Populations] Operator = [ -(1/T1 +r1A) , 0 , 1/TQ , 0; ... 2/T1 , -(1/T2 + r2A) ,0 , 0; ... 0 , 1/T2 , -(1/TQ + rQA) , 0; ... r1A , r2A, rQA , -1/TA ]; % eigenvectors and eigenvalues [V,D] = eig(Operator); eigenvalues = diag(D); % save the eigenvectors and eigenvalues in case you want them. solution.V = V; solution.D = D; solution.eigenvalues = eigenvalues; % initial condition VecNow = [ parameters.K1.initial ; parameters.K2.initial ; ... parameters.Q.initial ; parameters.A.initial ] ; solution.K1(1) = VecNow(1); solution.K2(1) = VecNow(2); solution.Q(1) = VecNow(3); solution.A(1) = VecNow(4); solution.K(1) = solution.K1(1) + solution.K2(1); solution.Live(1) = sum( VecNow(1:3) ); solution.Total(1) = sum( VecNow(1:4) ); solution.AI(1) = solution.A(1) / solution.Total(1); solution.KI1(1) = solution.K1(1) / solution.Total(1); solution.KI2(1) = solution.K2(1) / solution.Total(1); solution.KI(1) = solution.KI1(1) + solution.KI2(1); % now, get the coefficients to write the analytic solution % [Populations] = c1*V(:,1)*exp( d(1,1)*t) + c2*V(:,2)*exp( d(2,2)*t ) + % c3*V(:,3)*exp( d(3,3)*t) + c4*V(:,4)*exp( d(4,4)*t ); coeff = linsolve( V , VecNow ); % find the (hopefully one) positive eigenvalue. % eigensolutions with negative eigenvalues decay, % leaving this as the long-time behavior. eigenvalues = diag(D); n = find( real( eigenvalues ) > 0 ) solution.long_time.KI1 = V(1,n) / sum( V(:,n) ); solution.long_time.KI2 = V(2,n) / sum( V(:,n) ); solution.long_time.QI = V(3,n) / sum( V(:,n) ); solution.long_time.AI = V(4,n) / sum( V(:,n) ) ; solution.long_time.KI = solution.long_time.KI1 + solution.long_time.KI2; % now, write out the solution at all the times for i=2:length( solution.T ) % compact way to write the solution VecExact = real( V*( coeff .* exp( eigenvalues*solution.T(i) ) ) ); solution.K1(i) = VecExact(1); solution.K2(i) = VecExact(2); solution.Q(i) = VecExact(3); solution.A(i) = VecExact(4); solution.K(i) = solution.K1(i) + solution.K2(i); solution.Live(i) = sum( VecExact(1:3) ); solution.Total(i) = sum( VecExact(1:4) ); solution.AI(i) = solution.A(i) / solution.Total(i); solution.KI1(i) = solution.K1(i) / solution.Total(i); solution.KI2(i) = solution.K2(i) / solution.Total(i); solution.KI(i) = solution.KI1(i) + solution.KI2(i); end return;
Now, let’s run it and see what this thing looks like:
Next, we plot KI1, KI2, and AI versus time (solid curves), along with the theoretical long-time behavior (dashed curves). Notice how well it matches–it’s neat when theory works! :-)
Some readers may recognize the long-time fractions: KI1 + KI2 = KI = 0.1743, and AI = 0.00833, very close to the DCIS patient values from our simulation study in Macklin et al. (2012) and improved calibration work in Hyun and Macklin (2013).
Comparing simulations and theory
I wrote a small Matlab program to implement the discrete model: start with 1000 cells in the \(Q\) phase, and in each time interval \([t,t+\Delta t]\), each cell “decides” whether to advance to the next phase, stay in the same phase, or apoptose. If we compare a single run against the theoretical curves, we see hints of a match:
If we average 10 simulations and compare, the match is better:
And lastly, if we average 100 simulations and compare, the curves are very difficult to tell apart:
Even in logarithmic space, it’s tough to tell these apart:
Code
The following matlab files (available here) can be used to reproduce this post:
- Ki67_exact.m
- The function defined above to create the exact solution using the eigenvalue/eignvector approach.
- Ki67_stochastic.m
- Runs a single stochastic simulation, using the supplied parameters.
- script.m
- Runs the theoretical solution first, creates plots, and then runs the stochastic model 100 times for comparison.
To make it all work, simply run “script” at the command prompt. Please note that it will generate some png files in its directory.
Closing thoughts
In this post, we showed a nice way to check a discrete model against theoretical behavior–both in short-term dynamics and long-time behavior. The same work should apply to validating many discrete models. However, when you add spatial effects (e.g., a cellular automaton model that won’t proliferate without an empty neighbor site), I wouldn’t expect a match. (But simulating cells that initially have a “salt and pepper”, random distribution should match this for early times.)
Moreover, models with deterministic phase durations (e.g., K1, K2, and A have fixed durations) aren’t consistent with the ODE model above, unless the cells they are each initialized with a random amount of “progress” in their initial phases. (Otherwise, the cells in each phase will run synchronized, and there will be fixed delays before cells transition to other phases.) Delay differential equations better describe such models. However, for long simulation times, the slopes of the sub-populations and the cell fractions should start to better and better match the ODE models.
Now that we have verified that the discrete model is performing as expected, we can have greater confidence in its predictions, and start using those predictions to assess the underlying models. In ODE and PDE models, you often validate the code on simpler problems where you have an analytical solution, and then move on to making simulation predictions in cases where you can’t solve analytically. Similarly, we can now move on to variants of the discrete model where we can’t as easily match ODE theory (e.g., time-varying rate parameters, spatial effects), but with the confidence that the phase transitions are working as they should.
Paul Macklin interviewed at 2013 PSOC Annual Meeting
Paul Macklin gave a plenary talk at the 2013 NIH Physical Sciences in Oncology Annual Meeting. After the talk, he gave an interview to the Pauline Davies at the NIH on the need for data standards and model compatibility in computational and mathematical modeling of cancer. Of particular interest:
Pauline Davies: How would you ever get this standardization? Who would be responsible for saying we want it all reported in this particular way?
Paul Macklin: That’s a good question. It’s a bit of the chicken and the egg problem. Who’s going to come and give you data in your standard if you don’t have a standard? How do you plan a standard without any data? And so it’s a bit interesting. I just think someone needs to step forward and show leadership and try to get a small working group together, and at the end of the day, perfect is the enemy of the good. I think you start small and give it a go, and you add more to your standard as you need it. So maybe version one is, let’s say, how quickly the cells divide, how often they do it, how quickly they die, and what their oxygen level is, and maybe their positions. And that can be version one of this standard and a few of us try it out and see what we can do. I think it really comes down to a starting group of people and a simple starting point, and you grow it as you need it.
Shortly after, the MultiCellDS project was born (using just this strategy above!), with the generous assistance of the Breast Cancer Research Foundation.
Read / Listen to the interview: http://physics.cancer.gov/report/2013report/PaulMacklin.aspx (2013)
DCIS modeling paper accepted
I am pleased to report that our paper has now been accepted. You can download the accepted preprint here. We also have a lot of supplementary material, including simulation movies, simulation datasets (for 0, 15, 30, adn 45 days of growth), and open source C++ code for postprocessing and visualization.
I discussed the results in detail here, but here’s the short version:
- We use a mechanistic, agent-based model of individual cancer cells growing in a duct. Cells are moved by adhesive and repulsive forces exchanged with other cells and the basement membrane. Cell phenotype is controlled by stochastic processes.
- We constrained all parameter expected to be relatively independent of patients by a careful analysis of the experimental biological and clinical literature.
- We developed the very first patient-specific calibration method, using clinically-accessible pathology. This is a key point in future patient-tailored predictions and surgical/therapeutic planning.
- The model made numerous quantitative predictions, such as:
- The tumor grows at a constant rate, between 7 to 10 mm/year. This is right in the middle of the range reported in the clinic.
- The tumor’s size in mammgraphy is linearly correlated with the post-surgical pathology size. When we linearly extrapolate our correlation across two orders of magnitude, it goes right through the middle of a cluster of 87 clinical data points.
- The tumor necrotic core has an age structuring: with oldest, calcified material in the center, and newest, most intact necrotic cells at the outer edge.
- The appearance of a “typical” DCIS duct cross-section varies with distance from the leading edge; all types of cross-sections predicted by our model are observed in patient pathology.
- The model also gave new insight on the underlying biology of breast cancer, such as:
- The split between the viable rim and necrotic core (observed almost universally in pathology) is not just an artifact, but an actual biomechanical effect from fast necrotic cell lysis.
- The constant rate of tumor growth arises from the biomechanical stress relief provided by lysing necrotic cells. This points to the critical role of intracellular and intra-tumoral water transport in determining the qualitative and quantitative behavior of tumors.
- Pyknosis (nuclear degradation in necrotic cells), must occur at a time scale between that of cell lysis (on the order of hours) and cell calcification (on the order of weeks).
- The current model cannot explain the full spectrum of calcification types; other biophysics, such as degradation over a long, 1-2 month time scale, must be at play.
DCIS paper resubmitted; lots of clinical predictions, lots of validation
After a lot of revision, I have merged my two papers on ductal carcinoma in situ (DCIS) in to a single manuscript and resubmitted to the Journal of Theoretical Biology for final review. A preprint is posted at my website, along with considerable supplementary material (data sets and source code, and animations). Thanks to my co-authors and other friends for all the help in the revisions. Thanks also the reviewers for insightful comments. I think this revised manuscript is all the better for it.
DCIS is a precursor to invasive breast cancer, and it is generally detected by annual mammographic screening. More advanced DCIS (with greater risk) tends to have comedonecrosis–a type of cell death that leaves calcium phosphate deposits in the centers of the ducts. In fact, this is generally what’s detected in mammograms. DCIS is usually surgically removed by cutting out a small ball of tissue around what’s found in the mammograms (breast-conserving surgery, or lumpectomy). But current planning isn’t so great. Even with the state-of-the-art in patient imaging and surgical planning, about 20%-50% of women need to get a second surgery because the first one didn’t get the entire tumor.
So, there’s great need to understand calcifications, and how what you see in mammography relates to the actual tumor size and shape. And if you do have a model to do this, there’s great need to calibrate it to patient pathology data (the stuff you get from your biopsies) so that the models say something meaningful about individual patients. And there has been no method to do that. Until now.
(As far as I know), this paper is the first to calibrate to individual patient immunohistochemistry and histopathology. This, along with some parameter estimates to the theoretical and experimental biology literature, allows us to fully constrain the model. No free parameters to play with until it looks right. Any results are fully emergent from a mechanistic model and realistic parameter estimates rooted in the biology.
This model also includes the most detailed description of necrosis–the type of cell death that results in the comedonecrosis seen in mammograms. We include cell swelling, cell bursting, gradual loss of fluid, and the very first model of calcification.
Clinical predictions, with lots of validation:
All said and done, the model gives some big (and validated!) predictions:
- The model predicts that a tumor grows through the duct at a constant rate. This is consistent with what’s actually seen in mammography.
- The model gives a new explanation for the known trend: when necrotic cells burst and lose fluid, it makes it more mechanically favorable for proliferating cells to push into the center of the duct, rather than along the duct. For this reason the model predicts faster growth in smaller ducts, and slower growth in larger ducts.
- The model predicts growth rates between 7.5 and 10 mm per year. This is quantitatively consistent with published values in the clinical literature.
- The model predicts the difference between the size in a mammogram and the actual size (as measured by a pathologist after surgery) grows in time. This unfortunately means that it’s unlikely that there is some “fixed” safe distance to cut around the mammographic findings.
- On the other hand, the model predicts that there is a linear correlation between the size in a mammogram and the actual (pathology) tumor size. This bodes well for future surgical planning.
- Better still, the linear correlation we found quantitatively fits through 87 published patients, spanning two orders of magnitude.
The model also makes several key predictions on the smaller-scale biology:
- The model predicts that fast swelling of necrotic cells (on the order of 6 hours) is responsible for the tear between the viable rim and necrotic core seen in just about every pathology image of DCIS.
- The model predicts that the necrotic core is “age structured”, with newly necrotic cells (with relatively intact nuclei) on the outer edge, and interior band of mostly degraded but noncalcified cells, and a central core of oldest, calcified material. This compares well with patient histopathology.
- Comparing the model-predicted age structuring to histopathology predicts a sharper estimate on the various necrosis time scales: swelling and lysis (~6 h) < slow fluid loss (~days to a week) < pyknosis (~10+ days) < calcification (~2 weeks).
- Because the model only predicts linear / casting-type calcifications (long “plugs” of calcification), other biophysics must be responsible for the variety of calcification types seen in mammography.
- Among other mechanisms, we postulate a very long-timescale (1-2 months) process of degradation of the phospholipid “backbone” of the calcifications, resulting in degradation of the calcifications. The cracks seen in the central portions of calcifications (in histopathology) supports this view.