## Setting up a 64-bit gcc/OpenMP environment on Windows

Note: This is the part of a series of “how-to” blog posts to help new users and developers of BioFVM and PhysiCell. This guide is for Windows users. OSX users should use this guide for Homebrew (preferred method) or this guide for MacPorts (much slower but reliable). A Linux guide is expected soon.

These instructions should get you up and running with a minimal environment for compiling 64-bit C++ projects with OpenMP (e.g., BioFVM and PhysiCell) using a 64-bit Windows port of gcc. These instructions should work for any modern Windows installation, say Windows 7 or above. This tutorial assumes you have a 64-bit CPU running on a 64-bit operating system.

In the end result, you’ll have a compiler, key makefile capabilities, and a decent text editor. The entire toolchain is free and open source.

Of course, you can use other compilers and more sophisticated integrated desktop environments, but these instructions will get you a good baseline system with support for 64-bit binaries and OpenMP parallelization.

### What you’ll need:

1. MinGW-w64 compiler: This is a native port of the venerable gcc compiler for windows, with support for 64-bit executables. Download the latest installer (mingw-w64-install.exe) here. As of January 8, 2016, this installer will download gcc 5.3.0.
2. MSYS tools: This gets you some of the common command-line utilities from Linux, Unix, and BSD systems (make, touch, etc.). Download the latest installer (mingw-get-setup.exe) here.

### Main steps:

#### 1) Install the compiler

Run the mingw-w64-install.exe. When asked, select:

Version: 5.3.0 (or later)
Architecture: x86_64
Threads: win32 (While I have tested posix, the native threading should be faster.)
Exception: seh (While sjlj works and should be more compatible with various GNU tools, the native SEH should be faster.)
Build version: 0 (or the default)

Leave the destination folder wherever the installer wants to put it. In my case, it is:

c:\Program Files\mingw-w64\x86_64-5.3.0-win32-seh-rt_v4_rev0


#### 2) Install the MSYS tools

Run mingw-get-setup.exe. Leave the default installation directory and any other defaults on the initial screen. Click “continue” and let it download what it needs to get started. (a bunch of XML files, for the most part.) Click “Continue” when it’s done.

This will open up a new package manager. Our goal here is just to grab MSYS, rather than the full (but merely 32-bit) compiler. Scroll through and select (“mark for installation”) the following:

• mingw-developer-toolkit. (Note: This should automatically select msys-base.)

Next, click “Apply Changes” in the “Installation” menu. When prompted, click “Apply.” Let the package manager download and install what it needs (on the order of 95 packages). Go ahead and close things once the installation is done, including the package manager.

#### 3) Install the text editor

Run the Notepad++ installer. You can stick with the defaults.

Adding the compiler, text editor, and MSYS tools to your path helps you to run make files from the compiler. First, get the path of your compiler:

1. Open Windows Explorer ( [Windows]+E )
2. Browse through C:\, then Program Files, mingw-w64, then a messy path name corresponding to our installation choices (in my case, x86_64-5.3.0-win32-seh_rt_v4-rev0), then mingw64, and finally bin.
c:\Program Files\mingw-w64\x86_64-5.3.0-win32-seh-rt_v4_rev0\mingw64\bin\

Then, get the path to Notepad++.

1. Go back to Explorer, and choose “This PC” or “My Computer” from the left column.
2. Browse through C:\, then Program Files (x86), then Notepad++.
3. Copy the path from the Explorer address bar.
c:\Program Files (x86)\Notepad++\

Then, get the path for MSYS:

1. Go back to Explorer, and choose “This PC” or “My Computer” from the left column.
2. Browse through C:\, then MinGW, then msys, then 1.0, and finally bin.
3. Copy the path from the Explorer address bar.
C:\MinGW\msys\1.0\bin\

I wrote a sample C++ program that tests OpenMP parallelization (32 threads). If you can compile and run it, it means that everything (including make) is working! 🙂

##### Make a new directory, and enter it

Enter a command prompt ( [windows]+R, then cmd ). You should be in your user profile’s root directory. Make a new subdirectory, called GCC_test, and enter it.

mkdir GCC_test
cd GCC_test
##### Grab a sample parallelized program:

Download a Makefile and C++ source file, and save them to the GCC_test directory. Here are the links:

##### Compile and run the test:

Go back to your (still open) command prompt. Compile and run the program:

make
my_test


The output should look something like this:

Allocating 4096 MB of memory ...
Done!

Entering main loop ...
Done!


Open up the Windows task manager ([windows]+R, taskmgr) while the code is running.  Take a look at the performance tab, particularly the graphs of the CPU usage history. While your program is running, you should see all your virtual processes 100% utilized, unless you have more than 32 virtual CPUs. (This is a good indication that your code is running the OpenMP parallelization as expected.)

Note: If the make command gives errors like “**** missing separator”, then you need to replace the white space (e.g., one or more spaces) at the start of the “\$(COMPILE_COMMAND)” and “rm -f” lines with a single tab character.

### What’s next?

Download a copy of PhysiCell and try out the included examples! Visit BioFVM at MathCancer.org.

1. PhysiCell Method Paper at bioRxiv: https://doi.org/10.1101/088773
2. PhysiCell on MathCancer: http://PhysiCell.MathCancer.org
3. PhysiCell on SourceForge: http://PhysiCell.sf.net
4. PhysiCell on github: http://github.com/MathCancer/PhysiCell
2. BioFVM on MathCancer.org: http://BioFVM.MathCancer.org
3. BioFVM on SourceForge: http://BioFVM.sf.net
4. BioFVM Method Paper in BioInformatics: http://dx.doi.org/10.1093/bioinformatics/btv730

## BioFVM: an efficient, parallelized diffusive transport solver for 3-D biological simulations

I’m very excited to announce that our 3-D diffusion solver has been accepted for publication and is now online at Bioinformatics. Click here to check out the open access preprint!

A. Ghaffarizadeh, S.H. Friedman, and P. Macklin. BioFVM: an efficient, parallelized diffusive transport solver for 3-D biological simulations. Bioinformatics, 2015.
DOI: 10.1093/bioinformatics/btv730 (free; open access)

BioFVM (stands for “Finite Volume Method for biological problems) is an open source package to solve for 3-D diffusion of several substrates with desktop workstations, single supercomputer nodes, or even laptops (for smaller problems). We built it from the ground up for biological problems, with optimizations in C++ and OpenMP to take advantage of all those cores on your CPU. The code is available at SourceForge and BioFVM.MathCancer.org.

The main idea here is to make it easier to simulate big, cool problems in 3-D multicellular biology. We’ll take care of secretion, diffusion, and uptake of things like oxygen, glucose, metabolic waste products, signaling factors, and drugs, so you can focus on the rest of your model.

### Design philosophy and main capabilities

Solving diffusion equations efficiently and accurately is hard, especially in 3D. Almost all biological simulations deal with this, many by using explicit finite differences (easy to code and accurate, but very slow!) or implicit methods like ADI (accurate and relatively fast, but difficult to code with complex linking to libraries). While real biological systems often depend upon many diffusing things (lots of signaling factors for cell-cell communication, growth substrates, drugs, etc.), most solvers only scale well to simulating two or three. We solve a system of PDEs of the following form:

$\frac{\partial \vec{\rho}}{\partial t} = \overbrace{ \vec{D} \nabla^2 \vec{\rho} }^\textrm{diffusion} – \overbrace{ \vec{\lambda} \vec{\rho} }^\textrm{decay} + \overbrace{ \vec{S} \left( \vec{\rho}^* – \vec{\rho} \right) }^{\textrm{bulk source}} – \overbrace{ \vec{U} \vec{\rho} }^{\textrm{bulk uptake}} + \overbrace{\sum_{\textrm{cells } k} 1_k(\vec{x}) \left[ \vec{S}_k \left( \vec{\rho}^*_k – \vec{\rho} \right) – \vec{U}_k \vec{\rho} \right] }^\textrm{sources and sinks by cells}$
Above, all vector-vector products are term-by-term.

#### Solving for many diffusing substrates

We set out to write a package that could simulate many diffusing substrates using algorithms that were fast but simple enough to optimize. To do this, we wrote the entire solver to work on vectors of substrates, rather than on individual PDEs. In performance testing, we found that simulating 10 diffusing things only takes about 2.6 times longer than simulating one. (In traditional codes, simulating ten things takes ten times as long as simulating one.) We tried our hardest to break the code in our testing, but we failed. We simulated all the way from 1 diffusing substrate up to 128 without any problems. Adding new substrates increases the computational cost linearly.

#### Combining simple but tailored solvers

We used an approach called operator splitting: breaking a complicated PDE into a series of simpler PDEs and ODEs, which can be solved one at a time with implicit methods.  This allowed us to write a very fast diffusion/decay solver, a bulk supply/uptake solver, and a cell-based secretion/uptake solver. Each of these individual solvers was individually optimized. Theory tells us that if each individual solver is first-order accurate in time and stable, then the overall approach is first-order accurate in time and stable.

The beauty of the approach is that each solver can individually be improved over time. For example, in BioFVM 1.0.2, we doubled the performance of the cell-based secretion/uptake solver. The operator splitting approach also lets us add new terms to the “main” PDE by writing new solvers, rather than rewriting a large, monolithic solver. We will take advantage of this to add advective terms (critical for interstitial flow) in future releases.

#### Optimizing the diffusion solver for large 3-D domains

For the first main release of BioFVM, we restricted ourselves to Cartesian meshes, which allowed us to write very tailored mesh data structures and diffusion solvers. (Note: the finite volume method reduces to finite differences on Cartesian meshes with trivial Neumann boundary conditions.) We intend to work on more general Voronoi meshes in a future release. (This will be particularly helpful for sources/sinks along blood vessels.)

By using constant diffusion and decay coefficients, we were able to write very fast solvers for Cartesian meshes. We use the locally one-dimensional (LOD) method–a specialized form of operator splitting–to break the 3-D diffusion problem into a series of 1-D diffusion problems. For each (y,z) in our mesh, we have a 1-D diffusion problem along x. This yields a tridiagonal linear system which we can solve efficiently with the Thomas algorithm. Moreover, because the forward-sweep steps only depend upon the coefficient matrix (which is unchanging over time), we can pre-compute and store the results in memory for all the x-diffusion problems. In fact, the structure of the matrix allows us to pre-compute part of the back-substitution steps as well. Same for y- and z-diffusion. This gives a big speedup.

Next, we can use all those CPU cores to speed up our work. While the back-substitution steps of the Thomas algorithm can’t be easily parallelized (it’s a serial operation), we can solve many x-diffusion problems at the same time, using independent copies (instances) of the Thomas solver. So, we break up all the x-diffusion problems up across a big OpenMP loop, and repeat for y– and z-diffusion.

Lastly, we used overloaded +=, axpy and similar operations on the vector of substrates, to avoid unnecessary (and very expensive) memory allocation and copy operations wherever we could. This was a really fun code to write!

The work seems to have payed off: we have found that solving on 1 million voxel meshes (about 8 mm3 at 20 μm resolution) is easy even for laptops.

#### Simulating many cells

We tailored the solver to allow both lattice- and off-lattice cell sources and sinks. Desktop workstations should have no trouble with 1,000,000 cells secreting and uptaking a few substrates.

#### Simplifying the non-science

We worked to minimize external dependencies, because few things are more frustrating than tracking down a bunch of libraries that may not work together on your platform. The first release BioFVM only has one external dependency: pugixml (an XML parser). We didn’t link an entire linear algebra library just to get axpy and a Thomas solver–it wouldn’t have been optimized for our system anyway. We implemented what we needed of the freely available .mat file specification, rather than requiring a separate library for that. (We have used these matlab read/write routines in house for several years.)

Similarly, we stuck to a very simple mesh data structure so we wouldn’t have to maintain compatibility with general mesh libraries (which can tend to favor feature sets and generality over performance and simplicity).  Rather than use general-purpose ODE solvers (with yet more library dependencies, and more work for maintaining compatibility), we wrote simple solvers tailored specifically to our equations.

The upshot of this is that you don’t have to do anything fancy to replicate results with BioFVM. Just grab a copy of the source, drop it into your project directory, include it in your project (e.g., your makefile), and you’re good to go.

### All the juicy details

The Bioinformatics paper is just 2 pages long, using the standard “Applications Note” format. It’s a fantastic format for announcing and disseminating a piece of code, and we’re grateful to be published there. But you should pop open the supplementary materials, because all the fun mathematics are there:

• The full details of the numerical algorithm, including information on our optimizations.
• Convergence tests: For several examples, we showed:
• First-order convergence in time (with respect to Δt), and stability
• Second-order convergence in space (with respect to Δx)
• Accuracy tests: For each convergence test, we looked at how small Δt has to be to ensure 5% relative accuracy at Δx = 20 μm resolution. For oxygen-like problems with cell-based sources and sinks, Δt = 0.01 min will do the trick. This is about 15 times larger than the stability-restricted time step for explicit methods.
• Performance tests:
• Computational cost (wall time to simulate a fixed problem on a fixed domain size with fixed time/spatial resolution) increases linearly with the number of substrates. 5-10 substrates are very feasible on desktop workstations.
• Computational cost increases linearly with the number of voxels
• Computational cost increases linearly in the number of cell-based source/sinks

And of course because this code is open sourced, you can dig through the implementation details all you like! (And improvements are welcome!)

### What’s next?

• As MultiCellDS (multicellular data standard) matures, we will implement read/write support for  <microenvironment> data in digital snapshots.
• We have a few ideas to improve the speed of the cell-based sources and sinks. In particular, switching to a higher-order accurate solver may allow larger time step sizes, so long as the method is still stable. For the specific form of the sources/sinks, the trapezoid rule could work well here.
• I’d like to allow a spatially-varying diffusion coefficient. We could probably do this (at very great memory cost) by writing separate Thomas solvers for each strip in x, y, and z, or by giving up the pre-computation part of the optimization. I’m still mulling this one over.
• I’d also like to implement non-Cartesian meshes. The data structure isn’t a big deal, but we lose the LOD optimization and Thomas solvers. In this case, we’d either use explicit methods (very slow!), use an iterative matrix solver (trickier to parallelize nicely, except in matrix-vector multiplication operations), or start with quasi-steady problems that let us use Gauss-Seidel iterative type methods, like this old paper.
• Since advective flow (particularly interstitial flow) is so important for many problems, I’d like to add an advective solver. This will require some sort of upwinding to maintain stability.
• At some point, we’d like to port this to GPUs. However, I don’t currently have time / resources to maintain a separate CUDA or OpenCL branch. (Perhaps this will be an excuse to learn Julia on GPUs.)

Well, we hope you find BioFVM useful. If you give it a shot, I’d love to hear back from you!

Very best — Paul

## Paul Macklin profiled in New Scientist article

Paul Macklin was recently featured in a New Scientist article on multidisciplinary jobs in cancer.  It profiled the non-linear path he and others took to reach a multi-disciplinary career blending biology, mathematics, and computing.

Read the article: http://jobs.newscientist.com/article/knocking-cancer-out/ (Apr. 16, 2015)

## Banner and Logo Contest : MultiCellDS Project

As the MultiCellDS (multicellular data standards) project continues to ramp up, we could use some artistic skill.

Right now, we don’t have a banner (aside from a fairly barebones placeholder using a lovely LCARS font) or a logo. While I could whip up a fancier banner and logo, I have a feeling that there is much better talent out there. So, let’s have a contest!

Here are the guidelines and suggestions:

1. The banner should use the text MultiCellDS Project. It’s up to artist (and the use) whether the “multicellular data standards” part gets written out more fully (e.g., below the main part of the banner).
2. The logo should be shorter and easy to use on other websites. I’d suggest MCDS, stylized similarly to the main banner.
3. Think of MultiCell as a prefix: MultiCellDS, MultiCellXML, MultiCellHDF, MultiCellDB. So, the “banner” version should be extensible to new directions on the project.
4. The banner and logo should be submitted in a vector graphics format, with all source.
5. It goes without saying that you can’t use clip art that you don’t have rights to. (i.e., use your own artwork or photos, or properly-attributed creative commons-licensed art.)
6. The banner and logo need to belong to the MultiCellDS project once done.
7. We may do some final tweaks and finalization on the winning design for space or other constraints. But this will be done in full consultation with the winner.

So, what are the perks for winning?

1. Permanent link to your personal research / profession page crediting you as the winner.
2. A blog/post detailing how awesome you and your banner and logo are.
3. Beer / coffee is on me next time I see you. SMB 2015 in Atlanta might be a good time to do it!
4. If we ever make t-shirts, I’ll buy yours for you. 🙂
5. You get to feel good for being awesome and helping out the project!

So, please post here, on the @MultiCellDS twitter feed, or contact me if you’re interested.  Once I get a sense of interest, I’ll set a deadline for submissions and “voting” procedures.

Thanks!!

## 2015 Speaking Schedule

Here is my current speaking schedule for 2015. Please join me if you can!

Feb. 13, 2015: Seminar at the Institute for Scientific Computing Research, Lawrence Livermore National Laboratory (LLNL)
Title: Scalable 3-D Agent-Based Simulations of Cells and Tissues in Biology and Cancer [abstract]

## Paul Macklin calls for common standards in cancer modeling

At a recent NCI-organized mini-symposium on big data in cancer, Paul Macklin called for new data standards in Multicellular data in simulations, experiments, and clinical science. USC featured the talk (abstract here) and the work at news.usc.edu.

Read the article: http://news.usc.edu/59091/usc-researcher-calls-for-common-standards-in-cancer-modeling/ (Feb. 21, 2014)

## 2014 Speaking Schedule

Here is my current speaking schedule for 2014. Please join me if you can!

Feb. 16, 2014: American Association for the Advancement of Science (AAAS) Annual Meeting, Chicago
Title: Integrating Next-Generation Computational Models of Cancer Progression and Outcome [abstract]
invited by the National Cancer Institute

May 9, 2014: European Society for Medical Oncology (ESMO) 2014 IMPAKT Breast Cancer Conference, Brussels, Belgium
Title: Calibrating breast cancer simulations with patient pathology: Progress and future steps [programme]
Plenary talk

May 13, 2014: Wolfson Centre for Mathematical Biology at the University of Oxford, Oxford, UK
Title: Advances in parallelized 3-D agent-based cancer modeling and digital cell lines [abstract]

June 19, 2014: Biostatistics Seminar, University of Southern California, Los Angeles
Title: Simulating 3-D systems of 500k cells with an agent-based model, and digital cell lines [link]

Aug. 18, 2014: COMBINE (Computational Modeling in Biology Network) 2014 Symposium, University of Southern California, Los Angeles
Title: Digital cell lines and MultiCellDS: Standardizing cell phenotype data for data-driven cancer simulations[Program]

2013 public speaking schedule

I’m in the process of rolling out some updates to my website. The first thing you’ll see is a new talk / tutorial on computational modeling of biological processes, based upon my recent talk at the USC PS-OC Short Course in October 2013. I’ll make another post here when it’s ready. It will include MATLAB source code to run through the models.

In the medium term, I hope to update my list of projects to better reflect current efforts by my lab, particularly in (1) integrative modeling of cancer metastases using high-throughput in vitro experiments and sophisticated bioengineered tissues for calibration and validation, and (2) development of standardizations for cell- and tissue-scale models and experiments.

In the longer term, I hope to switch my website layout a bit to be more like the USC PSOC website. I wrote that site about a year ago, and I like the CSS and structure a lot better. 🙂

## Interview at the 2013 NCI PS-OC Annual Meeting

I recently had the opportunity to be interviewed by Pauline Davies at the 2013 Physical Sciences-Oncology Annual Meeting. (I gave one of the addresses–“Exploring Possibilities for Next-Generation Computational Cancer Models that Work Together”–at the meeting; agenda available here.) The interview (largely in layman’s terms) discusses mathematical and computational modeling of cancer, the potential role for computational modeling in understanding cancer and making predictions that could help patients and their doctors make treatment choices, and the need for model and data standardizations to enable better predictions in the future. The interview draws parallels to hurricane predictions, where multiple models can read/write standardized data and be combined to improve their accuracy.

My interview can be found here, as can the entire set of selected 2013 interviews. You can find more information on my lab and work at MathCancer.org.

I really want to thank Pauline Davies, Jonathan Franca-Koh and the NCI Office of Physical Sciences in Oncology for the opportunity for this discussion!

## Paul Macklin interviewed at 2013 PSOC Annual Meeting

Paul Macklin gave a plenary talk at the 2013 NIH Physical Sciences in Oncology Annual Meeting. After the talk, he gave an interview to the Pauline Davies at the NIH on the need for data standards and model compatibility in computational and mathematical modeling of cancer. Of particular interest:

Pauline Davies: How would you ever get this standardization? Who would be responsible for saying we want it all reported in this particular way?

Paul Macklin: That’s a good question. It’s a bit of the chicken and the egg problem. Who’s going to come and give you data in your standard if you don’t have a standard? How do you plan a standard without any data? And so it’s a bit interesting. I just think someone needs to step forward and show leadership and try to get a small working group together, and at the end of the day, perfect is the enemy of the good. I think you start small and give it a go, and you add more to your standard as you need it. So maybe version one is, let’s say, how quickly the cells divide, how often they do it, how quickly they die, and what their oxygen level is, and maybe their positions. And that can be version one of this standard and a few of us try it out and see what we can do. I think it really comes down to a starting group of people and a simple starting point, and you grow it as you need it.

Shortly after, the MultiCellDS project was born (using just this strategy above!), with the generous assistance of the Breast Cancer Research Foundation.

Read / Listen to the interview: http://physics.cancer.gov/report/2013report/PaulMacklin.aspx (2013)