LBM-CaImAn-Python Documentation#
For the MATLAB
implementation of this pipeline, see here.
For current installation instructions and requirements, see the project README.
How to#
The recommended way to use this pipeline is by using the example notebooks while having the user guide accessible as well. The notebooks act as a walkthrough tutorial that you can change to fit your dataset as you go. Fully rendered versions of the notebooks are available in the tutorial section of this documentation.
Documentation Contents#
- User Guide
- Tutorial
- API
calculate_centers()
calculate_num_patches()
clean_batch()
combine_z_planes()
concat_param_diffs()
default_params()
delete_batch_rows()
fix_scan_phase()
generate_patch_view()
get_all_batch_items()
get_batch_from_path()
get_files_ext()
get_metadata()
get_noise_fft()
get_single_patch_coords()
get_summary_batch()
get_summary_cnmf()
greedyROI()
load_batch()
params_from_metadata()
read_scan()
remove_batch_duplicates()
return_scan_offset()
save_as()
unvectorize()
vectorize()
- Glossary
Pipeline Overview#
LBM-CaImAn-Pipeline uses mesmerize-core to interface with CaImAn algorithms for Calcium Imaging data processing.
There are 4 steps in this pipeline:
Assembly
De-interleave planes
(optional) Scan-Phase Correction
Motion Correction
Rigid/non-rigid Registration (single z-plane)
Evaluation metrics
(optional) Parameter grid-search
Register remaining z-planes
Segmentation
CNMF (single z-plane)
(optional) Deconvolution / spike-extraction
Refine neuron selection
Segment / deconvolve remianing z-planes
Collation
Collate images and metadata into a single volume
Lateral offset correction (between z-planes, COMING SOON)
HPC#
Slurm utilities are available in the utilities repository.
Installation
git clone https://github.com/MillerBrainObservatory/utilities.git
Transfering data to the HPC
rsync -avPh /path/to/local/data username@dtn02-hpc.rockefeller.edu:/lustre/fs4/mbo/scratch/<username>/data/
Transfering data to the your local machine
# you will need the ssh config for rbo in your .ssh/config file
rsync -avPh -e "ssh" ./path/to/data rbo:/path/to/destination
Files
multifile_batch.sbatch
- Submit a job to the HPCtunnel.sbatch
- Create a tunnel to the HPC
multi_file_batch.sbatch
#
a user folder in the MBO accounts scratch/
ssh keys for local->hpc and hpc->local transfers
(windows only)
rsync
download or with git bash and this utility.
Login to the hpc on the rocky-9 login node:
ssh username@login-05-hpc.rockefeller.edu
Create a backup of your .bashrc file:
cp ~/.bashrc ~/.bashrc.bak
Move the bashrc file in the utilities folder you cloned to replace the one you just backed up:
cp utilities/.bashrc ~/.bashrc
This file exports some environment variables that are necessary for the pipeline to run.
You will need your files in the mbo_data
.
After submitting a job via multifile_batch.sbatch
, you can monitor the job with squeue -u $USER
and cancel it with scancel $JOB_ID
.
The resulting batch files will be synced back to your local machine.
Transfering POSIX->Windows#
You can move your batch contents anywhere. However, if moving them to a different operating system, you may encounter the followng error:
NotImplementedError: cannot instantiate 'PosixPath' on your system
You can use lbm_caiman_python.load_batch_cross_platform to load this batch item.
Comparison with LBM-CaImAn-MATLAB#
Usage#
Beyond the obviously different programming language (MATLAB -> Python), there are a few differences in how these pipelines were constructed.
The MATLAB implementation was essentially 4 functions spread across 4 .m
files. These functions would be called from a user-made script (for example, demo_LBM_pipeline.m).
Performance#
The primary pitfal of LBM-CaImAn-MATLAB are the memory constraints. Though MATLAB is extremely efficient with threaded internal functions, the lack of 3rd-party library support means reading and writing to well-established file-formats (i.e. .tiff
, .hdf5
) lacking modern features like lazy-loading data. As a result, the memory footprint required to process a N-GB dataset will be N-GB of memory.
This pipeline utilizes the well-tested and optimized tifffile to selectively load data only when it is needed. That is why processing a 35 GB
file will only consume ~5 GB
of memory.