I/O

1. I/O#

Functions to help with loading and saving data.

Write a supported lazy imaging array (Suite2p, HDF5, TIFF, etc.) to disk.

This function handles writing multi-dimensional imaging data to various formats, with support for ROI selection, z-plane registration, chunked streaming, and format conversion. Use with imread() to load and convert imaging data.

Parameters:

lazy_arrayobject

One of the supported lazy array readers providing .shape, .metadata, and _imwrite() methods:

MboRawArray : Raw ScanImage/ScanMultiROI TIFF files with phase correction
Suite2pArray : Memory-mapped binary (data.bin or data_raw.bin) + ops.npy
MBOTiffArray : Multi-file TIFF reader using Dask backend
TiffArray : Single or multi-TIFF reader
H5Array : HDF5 dataset wrapper (h5py.File[dataset])
ZarrArray : Collection of z-plane .zarr stores
NpyArray : Single .npy memory-mapped NumPy file
NWBArray : NWB file with “TwoPhotonSeries” acquisition dataset

outpathstr or Path

Target directory to write output files. Will be created if it doesn’t exist. Files are named automatically based on plane/ROI (e.g., plane01_roi1.tiff).

extstr, default=”.tiff”

Output format extension. Supported formats: - .tiff, .tif : Multi-page TIFF (BigTIFF for >4GB) - .bin : Suite2p-compatible binary format with ops.npy metadata - .zarr : Zarr v3 array store - .h5, .hdf5 : HDF5 format

planeslist | tuple | int | None, optional

Z-planes to export (1-based indexing). Options: - None (default) : Export all planes - int : Single plane, e.g. planes=7 exports only plane 7 - list/tuple : Specific planes, e.g. planes=[1, 7, 14]

roiint | Sequence[int] | None, optional

ROI selection for multi-ROI data (e.g., MboRawArray from ScanImage). Options: - None (default) : Stitch/fuse all ROIs horizontally into single FOV - 0 : Split all ROIs into separate files (one file per ROI per plane) - int > 0 : Export specific ROI, e.g. roi=1 exports only ROI 1 - list/tuple : Export specific ROIs, e.g. roi=[1, 3] exports ROIs 1 and 3

num_framesint, optional

Number of frames to export. If None (default), exports all frames. Useful for testing or exporting subsets: num_frames=1000

register_zbool, default=False

Perform z-plane registration using Suite3D before writing. When True: - Computes rigid shifts between z-planes - Validates registration results (checks summary.npy for valid plane_shifts) - Applies shifts during write to align planes - Requires Suite3D and CuPy installed: pip install mbo_utilities[suite3d,cuda12] - Creates/reuses s3d job directory in outpath

shift_vectorsnp.ndarray, optional

Pre-computed z-shift vectors with shape (n_planes, 2) for [dy, dx] shifts. Use this to apply previously computed registration without re-running Suite3D. Example: shift_vectors=np.array([[0, 0], [2, -1], [1, 3]])

metadatadict, optional

Additional metadata to merge into output file headers/attributes. Merged with existing metadata from the source array.

overwritebool, default=False

Whether to overwrite existing output files. If False, skips existing files with a warning.

orderlist | tuple, optional

Reorder planes before writing. Must have same length as planes. Example: planes=[1,2,3], order=[2,0,1] writes planes in order [3,1,2]

target_chunk_mbint, optional

Target chunk size in MB for streaming writes. Larger chunks may be faster but use more memory. Default is 100 MB (~200 frames for typical 512x512 int16 data). This ensures sufficient frames for accurate phase correction when enabled.

progress_callbackCallable, optional

Callback function for progress updates: callback(progress, current_plane). Receives progress as float 0-1 and current plane index.

debugbool, default=False

Enable verbose logging to terminal for troubleshooting.

output_namestr, optional

Filename for binary output when ext=”.bin”. Common options: - “data_raw.bin” : Raw, unregistered data (default for BinArray) - “data.bin” : Registered data (typical after Suite2p registration) If None, defaults to “data_raw.bin” for new binaries or preserves existing name when reading from BinArray/Suite2pArray. Ignored for non-binary output formats.

omebool, default=False

Write OME-Zarr metadata when ext=”.zarr”. Creates OME-NGFF v0.5 compliant metadata including multiscales, axes, and coordinate transformations. Enables compatibility with OME-Zarr viewers and analysis tools. If True and ext is not “.zarr”, this parameter is ignored.

**kwargs

Additional format-specific options passed to writer backends.

Returns:

Path: Path to the output directory containing written files.

Raises:

TypeError: If lazy_array type is unsupported or incompatible with specified options.
ValueError: If outpath parent doesn’t exist, metadata is malformed, or parameters are invalid.
FileNotFoundError: If expected companion files (e.g., ops.npy, summary.npy) are missing.
KeyError: If registration is requested but plane_shifts is missing from summary.

See also

imread: Load imaging data from various formats
register_zplanes_s3d: Compute z-plane registration using Suite3D
validate_s3d_registration: Validate Suite3D registration results

Notes

File Naming Convention: - Single ROI or stitched: plane{Z:02d}_stitched.{ext} - Multiple ROIs: plane{Z:02d}_roi{R}.{ext} - Binary format: plane{Z:02d}_roi{R}/data_raw.bin + ops.npy

Registration (register_z=True): - Validates existing registration by checking summary/summary.npy for valid

plane_shifts array with shape (n_planes, 2)

Only reruns Suite3D if validation fails or no existing job found
Registration shifts are applied during write to align planes spatially
Output files are padded to accommodate all shifts

Memory Management: - Data is streamed in chunks (controlled by target_chunk_mb) - Only one chunk is held in memory at a time - Large files (>4GB) automatically use BigTIFF format

Phase Correction (MboRawArray only): - Set lazy_array.fix_phase = True before calling imwrite - Corrects bidirectional scanning artifacts - Methods: ‘mean’, ‘median’, ‘max’ (set via lazy_array.phasecorr_method)

Examples

Basic Usage - Stitch ROIs and save all planes as TIFF:

>>> from mbo_utilities import imread, imwrite
>>> data = imread("path/to/raw/*.tiff")
>>> imwrite(data, "output/session1", roi=None)  # Stitches all ROIs

Save specific planes only (first, middle, last for 14-plane volume):

>>> imwrite(data, "output/session1", planes=[1, 7, 14])
# Creates: plane01_stitched.tiff, plane07_stitched.tiff, plane14_stitched.tiff

Split all ROIs into separate files:

>>> imwrite(data, "output/session1", roi=0)
# Creates: plane01_roi1.tiff, plane01_roi2.tiff, ..., plane14_roi1.tiff, ...

Save specific ROIs only:

>>> imwrite(data, "output/session1", roi=[1, 3])  # Only ROIs 1 and 3
>>> imwrite(data, "output/session1", roi=2)       # Only ROI 2

Z-plane registration with Suite3D:

>>> data = imread("path/to/raw/*.tiff")
>>> imwrite(data, "output/registered", register_z=True, roi=None)
# Computes and applies rigid shifts to align z-planes spatially

Use pre-computed registration shifts:

>>> shifts = np.load("previous_job/summary/summary.npy", allow_pickle=True).item()
>>> shift_vectors = shifts['plane_shifts']  # shape: (n_planes, 2)
>>> imwrite(data, "output/registered", shift_vectors=shift_vectors)

Convert to Suite2p binary format:

>>> data = imread("path/to/raw/*.tiff")
>>> imwrite(data, "output/suite2p", ext=".bin", roi=0)
# Creates: plane01_roi1/data_raw.bin, plane01_roi1/ops.npy, ...

Export subset of frames for testing:

>>> imwrite(data, "output/test", num_frames=1000, planes=[1, 7, 14])
# Exports only first 1000 frames of planes 1, 7, and 14

Save to Zarr format with compression:

>>> imwrite(data, "output/zarr_store", ext=".zarr", roi=0)
# Creates: output/zarr_store/plane01_roi1.zarr, ...

Enable phase correction (for raw ScanImage data):

>>> data = imread("path/to/raw/*.tiff")
>>> data.fix_phase = True
>>> data.phasecorr_method = "mean"  # or "median", "max"
>>> data.use_fft = True  # Use FFT-based correction (faster)
>>> imwrite(data, "output/corrected", roi=None)

Overwrite existing files:

>>> imwrite(data, "output/session1", planes=[1, 2, 3], overwrite=True)

Custom metadata:

>>> custom_meta = {"experimenter": "MBO-User", "Date": "2025-01-15"}
>>> imwrite(data, "output/session1", metadata=custom_meta)

Reorder planes:

>>> imwrite(data, "output/session1", planes=[3, 2, 1], order=[2, 1, 0])
# Writes plane 3 first, then plane 2, then plane 1

Progress callback reports per-zplane completion % for UIs:

>>> def progress_handler(progress, plane):
...     print(f"Plane {plane}: {progress*100:.1f}% complete")
>>> imwrite(data, "output/session1", progress_callback=progress_handler)

Save as OME-Zarr with NGFF v0.5 metadata:

>>> imwrite(data, "output/session1", ext=".zarr", ome=True)
# Creates OME-Zarr stores with multiscales, axes, and coordinate transformations
# Compatible with OME-Zarr viewers (napari, vizarr, etc.)

mbo_utilities.imread(inputs: str | Path | Sequence[str | Path], **kwargs)[source]#

Lazy load imaging data from supported file types.

Currently supported file types: - .bin: Suite2p binary files (.bin + ops.npy) - .tif/.tiff: TIFF files (BigTIFF, OME-TIFF and raw ScanImage TIFFs) - .h5: HDF5 files - .zarr: Zarr v3

Parameters:

inputsstr, Path, ndarray, MboRawArray, or sequence of str/Path: Input source. Can be: - Path to a file or directory - List/tuple of file paths - An existing lazy array
**kwargs: Extra keyword arguments passed to specific array readers.

Returns:

array_like: One of Suite2pArray, TiffArray, MboRawArray, MBOTiffArray, H5Array, or the input ndarray.
Examples

>>> from mbo_utilities import imread
    ..

>>> arr = imread("/data/raw")  # directory with supported files, for full filename
    ..

mbo_utilities.get_files(base_dir, str_contains='', max_depth=1, sort_ascending=True, exclude_dirs=None) → list | Path[source]#

Recursively search for files in a specified directory whose names contain a given substring, limiting the search to a maximum subdirectory depth. Optionally, the resulting list of file paths is sorted in ascending order using numeric parts of the filenames when available.

Parameters:

base_dirstr or Path: The base directory where the search begins. This path is expanded (e.g., ‘~’ is resolved) and converted to an absolute path.
str_containsstr, optional: A substring that must be present in a file’s name for it to be included in the result. If empty, all files are matched.
max_depthint, optional: The maximum number of subdirectory levels (relative to the base directory) to search. Defaults to 1. If set to 0, it is automatically reset to 1.
sort_ascendingbool, optional: If True (default), the matched file paths are sorted in ascending alphanumeric order. The sort key extracts numeric parts from filenames so that, for example, “file2” comes before “file10”.
exclude_dirsiterable of str or Path, optional: An iterable of directories to exclude from the resulting list of file paths. By default will exclude “.venv/”, “__pycache__/”, “.git” and “.github”].

Returns:

list of str: A list of full file paths (as strings) for files within the base directory (and its subdirectories up to the specified depth) that contain the provided substring.

Raises:

FileNotFoundError: If the base directory does not exist.
NotADirectoryError: If the specified base_dir is not a directory.

Examples

>>> import mbo_utilities as mbo
>>> # Get all files that contain "ops.npy" in their names by searching up to 3 levels deep:
>>> ops_files = mbo.get_files("path/to/files", "ops.npy", max_depth=3)
>>> # Get only files containing "tif" in the current directory (max_depth=1):
>>> tif_files = mbo.get_files("path/to/files", "tif")

mbo_utilities.files_to_dask(files: list[str | Path], astype=None, chunk_t=250)[source]#

Lazily build a Dask array or list of arrays depending on filename tags.

“plane”, “z”, or “chan” → stacked along Z (TZYX)
“roi” → list of 3D (T,Y,X) arrays, one per ROI
otherwise → concatenate all files in time (T)

mbo_utilities.get_metadata(file, z_step=None, verbose=False)[source]#

Extract metadata from a TIFF file or directory of TIFF files produced by ScanImage.

This function handles single files, lists of files, or directories containing TIFF files. When given a directory, it automatically finds and processes all TIFF files in natural sort order. For multiple files, it calculates frames per file accounting for z-planes.

Parameters:

fileos.PathLike, str, or list

Single file path: processes that file
Directory path: processes all TIFF files in the directory
List of file paths: processes all files in the list

z_stepfloat, optional

The z-step size in microns. If provided, it will be included in the returned metadata.

verbosebool, optional

If True, returns extended metadata including all ScanImage attributes. Default is False.

Returns:

dict: A dictionary containing extracted metadata. For multiple files, includes: - ‘frames_per_file’: list of frame counts per file (accounting for z-planes) - ‘total_frames’: total frames across all files - ‘file_paths’: list of processed file paths - ‘tiff_pages_per_file’: raw TIFF page counts per file

Raises:

ValueError: If no recognizable metadata is found or no TIFF files found in directory.

Examples

>>> # Single file
>>> meta = get_metadata("path/to/rawscan_00001.tif")
>>> print(f"Frames: {meta['num_frames']}")

>>> # Directory of files
>>> meta = get_metadata("path/to/scan_directory/")
>>> print(f"Files processed: {len(meta['file_paths'])}")
>>> print(f"Frames per file: {meta['frames_per_file']}")

>>> # List of specific files
>>> files = ["scan_00001.tif", "scan_00002.tif", "scan_00003.tif"]
>>> meta = get_metadata(files)

mbo_utilities.expand_paths(paths: str | Path | Sequence[str | Path]) → list[Path][source]#

Expand a path, list of paths, or wildcard pattern into a sorted list of actual files.

This is a handy wrapper for loading images or data files when you’ve got a folder, some wildcards, or a mix of both.

Parameters:

pathsstr, Path, or list of (str or Path): Can be a single path, a wildcard pattern like “*.tif”, a folder, or a list of those.

Returns:

list of Path: Sorted list of full paths to matching files.

Examples

>>> expand_paths("data/\*.tif")
[Path("data/img_000.tif"), Path("data/img_001.tif"), ...]

>>> expand_paths(Path("data"))
[Path("data/img_000.tif"), Path("data/img_001.tif"), ...]

>>> expand_paths(["data/\*.tif", Path("more_data")])
[Path("data/img_000.tif"), Path("more_data/img_050.tif"), ...]

mbo_utilities.get_mbo_dirs() → dict[source]#

Ensure ~/mbo and its subdirectories exist.

Returns a dict with paths to the root, settings, and cache directories.

mbo_utilities.load_ops(ops_input: str | Path | list[str | Path])[source]#: Simple utility load a suite2p npy file

mbo_utilities.write_ops(metadata, raw_filename, **kwargs)[source]#: Write metadata to an ops file alongside the given filename. metadata must contain ‘shape’, ‘pixel_resolution’, ‘frame_rate’ keys.

I/O

Contents

1. I/O#