Category: data standards
Saving MultiCellDS data from BioFVM
Note: This is part of a series of “how-to” blog posts to help new users and developers of BioFVM.
Introduction
A major initiative for my lab has been MultiCellDS: a standard for multicellular data. The project aims to create model-neutral representations of simulation data (for both discrete and continuum models), which can also work for segmented experimental and clinical data. A single-time output is called a digital snapshot. An interdisciplinary, multi-institutional review panel has been hard at work to nail down the draft standard.
A BioFVM MultiCellDS digital snapshot includes program and user metadata (more information to be included in a forthcoming publication), an output of the microenvironment, and any cells that are secreting or uptaking substrates.
As of Version 1.1.0, BioFVM supports output saved to MultiCellDS XML files. Each download also includes a matlab function for importing MultiCellDS snapshots saved by BioFVM programs. This tutorial will get you going.
BioFVM (finite volume method for biological problems) is an open source code for solving 3-D diffusion of 1 or more substrates. It was recently published as open access in Bioinformatics here:
http://dx.doi.org/10.1093/bioinformatics/btv730
The project website is at http://BioFVM.MathCancer.org, and downloads are at http://BioFVM.sf.net.
Working with MultiCellDS in BioFVM programs
We include a MultiCellDS_test.cpp file in the examples directory of every BioFVM download (Version 1.1.0 or later). Create a new project directory, copy the following files to it:
- BioFVM*.cpp and BioFVM*.h (from the main BioFVM directory)
- pugixml.* (from the main BioFVM directory)
- Makefile and MultiCellDS_test.cpp (from the examples directory)
Open the MultiCellDS_test.cpp file to see the syntax as you read the rest of this post.
See earlier tutorials (below) if you have troubles with this.
Setting metadata values
There are few key bits of metadata. First, the program used for the simulation (all these fields are optional):
// the program name, version, and project website: BioFVM_metadata.program.program_name = "BioFVM MultiCellDS Test"; BioFVM_metadata.program.program_version = "1.0"; BioFVM_metadata.program.program_URL = "http://BioFVM.MathCancer.org"; // who created the program (if known) BioFVM_metadata.program.creator.surname = "Macklin"; BioFVM_metadata.program.creator.given_names = "Paul"; BioFVM_metadata.program.creator.email = "Paul.Macklin@usc.edu"; BioFVM_metadata.program.creator.URL = "http://BioFVM.MathCancer.org"; BioFVM_metadata.program.creator.organization = "University of Southern California"; BioFVM_metadata.program.creator.department = "Center for Applied Molecular Medicine"; BioFVM_metadata.program.creator.ORCID = "0000-0002-9925-0151"; // (generally peer-reviewed) citation information for the program BioFVM_metadata.program.citation.DOI = "10.1093/bioinformatics/btv730"; BioFVM_metadata.program.citation.PMID = "26656933"; BioFVM_metadata.program.citation.PMCID = "PMC1234567"; BioFVM_metadata.program.citation.text = "A. Ghaffarizadeh, S.H. Friedman, and P. Macklin, BioFVM: an efficient parallelized diffusive transport solver for 3-D biological simulations, Bioinformatics, 2015. DOI: 10.1093/bioinformatics/btv730."; BioFVM_metadata.program.citation.notes = "notes here"; BioFVM_metadata.program.citation.URL = "http://dx.doi.org/10.1093/bioinformatics/btv730"; // user information: who ran the program BioFVM_metadata.program.user.surname = "Kirk"; BioFVM_metadata.program.user.given_names = "James T."; BioFVM_metadata.program.user.email = "Jimmy.Kirk@starfleet.mil"; BioFVM_metadata.program.user.organization = "Starfleet"; BioFVM_metadata.program.user.department = "U.S.S. Enterprise (NCC 1701)"; BioFVM_metadata.program.user.ORCID = "0000-0000-0000-0000"; // And finally, data citation information (the publication where this simulation snapshot appeared) BioFVM_metadata.data_citation.DOI = "10.1093/bioinformatics/btv730"; BioFVM_metadata.data_citation.PMID = "12345678"; BioFVM_metadata.data_citation.PMCID = "PMC1234567"; BioFVM_metadata.data_citation.text = "A. Ghaffarizadeh, S.H. Friedman, and P. Macklin, BioFVM: an efficient parallelized diffusive transport solver for 3-D biological simulations, Bioinformatics, 2015. DOI: 10.1093/bioinformatics/btv730."; BioFVM_metadata.data_citation.notes = "notes here"; BioFVM_metadata.data_citation.URL = "http://dx.doi.org/10.1093/bioinformatics/btv730";
You can sync the metadata current time, program runtime (wall time), and dimensional units using the following command. (This command is automatically run whenever you use the save command below.)
BioFVM_metadata.sync_to_microenvironment( M );
You can display a basic summary of the metadata via:
BioFVM_metadata.display_information( std::cout );
Setting options
By default (to save time and disk space), BioFVM saves the mesh as a Level 3 matlab file, whose location is embedded into the MultiCellDS XML file. You can disable this feature and revert to full XML (e.g., for human-readable cross-model reporting) via:
set_save_biofvm_mesh_as_matlab( false );
Similarly, BioFVM defaults to saving the values of the substrates in a compact Level 3 matlab file. You can override this with:
set_save_biofvm_data_as_matlab( false );
BioFVM by default saves the cell-centered sources and sinks. These take a lot of time to parse because they require very hierarchical data structures. You can disable saving the cells (basic_agents) via:
set_save_biofvm_cell_data( false );
Lastly, when you do save the cells, we default to a customized, minimal matlab format. You can revert to a more standard (but much larger) XML format with:
set_save_biofvm_cell_data_as_custom_matlab( false )
Saving a file
Saving the data is very straightforward:
save_BioFVM_to_MultiCellDS_xml_pugi( "sample" , M , current_simulation_time );
Your data will be saved in sample.xml. (Depending upon your options, it may generate several .mat files beginning with “sample”.)
If you’d like the filename to depend upon the simulation time, use something more like this:
double current_simulation_time = 10.347; char filename_base [1024]; sprintf( &filename_base , "sample_%f", current_simulation_time ); save_BioFVM_to_MultiCellDS_xml_pugi( filename_base , M, current_simulation_time );
Your data will be saved in sample_10.347000.xml. (Depending upon your options, it may generate several .mat files beginning with “sample_10.347000”.)
Compiling and running the program:
Edit the Makefile as below:
PROGRAM_NAME := MCDS_test all: $(BioFVM_OBJECTS) $(pugixml_OBJECTS) MultiCellDS_test.cpp $(COMPILE_COMMAND) -o $(PROGRAM_NAME) $(BioFVM_OBJECTS) $(pugixml_OBJECTS) MultiCellDS_test.cpp
If you’re running OSX, you’ll probably need to update the compiler from “g++”. See these tutorials.
Then, at the command prompt:
make ./MCDS_test
On Windows, you’ll need to run without the ./:
make MCDS_test
Working with MultiCellDS data in Matlab
Reading data in Matlab
Copy the read_MultiCellDS_xml.m file from the matlab directory (included in every MultiCellDS download). To read the data, just do this:
MCDS = read_MultiCellDS_xml( 'sample.xml' );
This should take around 30 seconds for larger data files (500,000 to 1,000,000 voxels with a few substrates, and around 250,000 cells). The long execution time is primarily because Matlab is ghastly inefficient at loops over hierarchical data structures. Increasing to 1,000,000 cells requires around 80-90 seconds to parse in matlab.
Plotting data in Matlab
Plotting the 3-D substrate data
First, let’s do some basic contour and surface plotting:
mid_index = round( length(MCDS.mesh.Z_coordinates)/2 ); contourf( MCDS.mesh.X(:,:,mid_index), ... MCDS.mesh.Y(:,:,mid_index), ... MCDS.continuum_variables(2).data(:,:,mid_index) , 20 ) ; axis image colorbar xlabel( sprintf( 'x (%s)' , MCDS.metadata.spatial_units) ); ylabel( sprintf( 'y (%s)' , MCDS.metadata.spatial_units) ); title( sprintf('%s (%s) at t = %f %s, z = %f %s', MCDS.continuum_variables(2).name , ... MCDS.continuum_variables(2).units , ... MCDS.metadata.current_time , ... MCDS.metadata.time_units, ... MCDS.mesh.Z_coordinates(mid_index), ... MCDS.metadata.spatial_units ) );
OR
contourf( MCDS.mesh.X_coordinates , MCDS.mesh.Y_coordinates, ... MCDS.continuum_variables(2).data(:,:,mid_index) , 20 ) ; axis image colorbar xlabel( sprintf( 'x (%s)' , MCDS.metadata.spatial_units) ); ylabel( sprintf( 'y (%s)' , MCDS.metadata.spatial_units) ); title( sprintf('%s (%s) at t = %f %s, z = %f %s', ... MCDS.continuum_variables(2).name , ... MCDS.continuum_variables(2).units , ... MCDS.metadata.current_time , ... MCDS.metadata.time_units, ... MCDS.mesh.Z_coordinates(mid_index), ... MCDS.metadata.spatial_units ) );
Here’s a surface plot:
surf( MCDS.mesh.X_coordinates , MCDS.mesh.Y_coordinates, ... MCDS.continuum_variables(1).data(:,:,mid_index) ) ; colorbar axis tight xlabel( sprintf( 'x (%s)' , MCDS.metadata.spatial_units) ); ylabel( sprintf( 'y (%s)' , MCDS.metadata.spatial_units) ); zlabel( sprintf( '%s (%s)', MCDS.continuum_variables(1).name, ... MCDS.continuum_variables(1).units ) ); title( sprintf('%s (%s) at t = %f %s, z = %f %s', MCDS.continuum_variables(1).name , ... MCDS.continuum_variables(1).units , ... MCDS.metadata.current_time , ... MCDS.metadata.time_units, ... MCDS.mesh.Z_coordinates(mid_index), ... MCDS.metadata.spatial_units ) );
Finally, here are some more advanced plots. The first is an “exploded” stack of contour plots:
clf contourslice( MCDS.mesh.X , MCDS.mesh.Y, MCDS.mesh.Z , ... MCDS.continuum_variables(2).data , [],[], ... MCDS.mesh.Z_coordinates(1:15:length(MCDS.mesh.Z_coordinates)),20); view([-45 10]); axis tight; xlabel( sprintf( 'x (%s)' , MCDS.metadata.spatial_units) ); ylabel( sprintf( 'y (%s)' , MCDS.metadata.spatial_units) ); zlabel( sprintf( 'z (%s)' , MCDS.metadata.spatial_units) ); title( sprintf('%s (%s) at t = %f %s', ... MCDS.continuum_variables(2).name , ... MCDS.continuum_variables(2).units , ... MCDS.metadata.current_time, ... MCDS.metadata.time_units ) );
Next, we show how to use isosurfaces with transparency
clf patch( isosurface( MCDS.mesh.X , MCDS.mesh.Y, MCDS.mesh.Z, ... MCDS.continuum_variables(1).data, 1000 ), 'edgecolor', ... 'none', 'facecolor', 'r' , 'facealpha' , 1 ); hold on patch( isosurface( MCDS.mesh.X , MCDS.mesh.Y, MCDS.mesh.Z, ... MCDS.continuum_variables(1).data, 5000 ), 'edgecolor', ... 'none', 'facecolor', 'b' , 'facealpha' , 0.7 ); patch( isosurface( MCDS.mesh.X , MCDS.mesh.Y, MCDS.mesh.Z, ... MCDS.continuum_variables(1).data, 10000 ), 'edgecolor', ... 'none', 'facecolor', 'g' , 'facealpha' , 0.5 ); hold off % shading interp camlight view(3) axis image axis tightcamlight lighting gouraud xlabel( sprintf( 'x (%s)' , MCDS.metadata.spatial_units) ); ylabel( sprintf( 'y (%s)' , MCDS.metadata.spatial_units) ); zlabel( sprintf( 'z (%s)' , MCDS.metadata.spatial_units) ); title( sprintf('%s (%s) at t = %f %s', ... MCDS.continuum_variables(1).name , ... MCDS.continuum_variables(1).units , ... MCDS.metadata.current_time, ... MCDS.metadata.time_units ) );
You can get more 3-D volumetric visualization ideas at Matlab’s website. This visualization post at MIT also has some great tips.
Plotting the cells
Here is a basic 3-D plot for the cells:
plot3( MCDS.discrete_cells.state.position(:,1) , ... MCDS.discrete_cells.state.position(:,2) , ... MCDS.discrete_cells.state.position(:,3) , 'bo' ); view(3) axis tight xlabel( sprintf( 'x (%s)' , MCDS.metadata.spatial_units) ); ylabel( sprintf( 'y (%s)' , MCDS.metadata.spatial_units) ); zlabel( sprintf( 'z (%s)' , MCDS.metadata.spatial_units) ); title( sprintf('Cells at t = %f %s', MCDS.metadata.current_time, ... MCDS.metadata.time_units ) );
plot3 is more efficient than scatter3, but scatter3 will give more coloring options. Here is the syntax:
scatter3( MCDS.discrete_cells.state.position(:,1), ... MCDS.discrete_cells.state.position(:,2), ... MCDS.discrete_cells.state.position(:,3) , 'bo' ); view(3) axis tight xlabel( sprintf( 'x (%s)' , MCDS.metadata.spatial_units) ); ylabel( sprintf( 'y (%s)' , MCDS.metadata.spatial_units) ); zlabel( sprintf( 'z (%s)' , MCDS.metadata.spatial_units) ); title( sprintf('Cells at t = %f %s', MCDS.metadata.current_time, ... MCDS.metadata.time_units ) );
Jan Poleszczuk gives some great insights on plotting many cells in 3D at his blog. I’d recommend checking out his post on visualizing a cellular automaton model. At some point, I’ll update this post with prettier plotting based on his methods.
What’s next
Future releases of BioFVM will support reading MultiCellDS snapshots (for model initialization).
Matlab is pretty slow at parsing and visualizing large amounts of data. We also plan to include resources for accessing MultiCellDS data in VTK / Paraview and Python.
Return to News • Return to MathCancer • Follow @MathCancer
Paul Macklin calls for common standards in cancer modeling
At a recent NCI-organized mini-symposium on big data in cancer, Paul Macklin called for new data standards in Multicellular data in simulations, experiments, and clinical science. USC featured the talk (abstract here) and the work at news.usc.edu.
Read the article: http://news.usc.edu/59091/usc-researcher-calls-for-common-standards-in-cancer-modeling/ (Feb. 21, 2014)
Paul Macklin interviewed at 2013 PSOC Annual Meeting
Paul Macklin gave a plenary talk at the 2013 NIH Physical Sciences in Oncology Annual Meeting. After the talk, he gave an interview to the Pauline Davies at the NIH on the need for data standards and model compatibility in computational and mathematical modeling of cancer. Of particular interest:
Pauline Davies: How would you ever get this standardization? Who would be responsible for saying we want it all reported in this particular way?
Paul Macklin: That’s a good question. It’s a bit of the chicken and the egg problem. Who’s going to come and give you data in your standard if you don’t have a standard? How do you plan a standard without any data? And so it’s a bit interesting. I just think someone needs to step forward and show leadership and try to get a small working group together, and at the end of the day, perfect is the enemy of the good. I think you start small and give it a go, and you add more to your standard as you need it. So maybe version one is, let’s say, how quickly the cells divide, how often they do it, how quickly they die, and what their oxygen level is, and maybe their positions. And that can be version one of this standard and a few of us try it out and see what we can do. I think it really comes down to a starting group of people and a simple starting point, and you grow it as you need it.
Shortly after, the MultiCellDS project was born (using just this strategy above!), with the generous assistance of the Breast Cancer Research Foundation.
Read / Listen to the interview: http://physics.cancer.gov/report/2013report/PaulMacklin.aspx (2013)