1. I/O#

Functions to help with loading and saving data.

mbo_utilities.imwrite(lazy_array, outpath: str | Path, ext: str = '.tiff', planes: list | tuple | None = None, num_frames: int | None = None, register_z: bool = False, roi: int | Sequence[int] | None = None, metadata: dict | None = None, overwrite: bool = False, order: list | tuple = None, target_chunk_mb: int = 100, progress_callback: Callable | None = None, debug: bool = False, shift_vectors: ndarray | None = None, output_name: str | None = None, output_suffix: str | None = None, **kwargs)[source]#

Write a supported lazy imaging array to disk.

This function handles writing multi-dimensional imaging data to various formats, with support for ROI selection, z-plane registration, chunked streaming, and format conversion. Use with imread() to load and convert imaging data.

Parameters:
lazy_arrayobject

One of the supported lazy array readers providing .shape, .metadata, and _imwrite() methods:

  • MboRawArray : Raw ScanImage/ScanMultiROI TIFF files with phase correction

  • Suite2pArray : Memory-mapped binary (data.bin or data_raw.bin) + ops.npy

  • MBOTiffArray : Multi-file TIFF reader using Dask backend

  • TiffArray : Single or multi-TIFF reader

  • H5Array : HDF5 dataset wrapper (h5py.File[dataset])

  • ZarrArray : Collection of z-plane .zarr stores

  • NumpyArray : Single .npy memory-mapped NumPy file

  • NWBArray : NWB file with “TwoPhotonSeries” acquisition dataset

outpathstr or Path

Target directory to write output files. Will be created if it doesn’t exist. Files are named automatically based on plane/ROI (e.g., plane01_roi1.tiff).

extstr, default=”.tiff”

Output format extension. Supported formats: - .tiff, .tif : Multi-page TIFF (BigTIFF for >4GB) - .bin : Suite2p-compatible binary format with ops.npy metadata - .zarr : Zarr v3 array store - .h5, .hdf5 : HDF5 format

planeslist | tuple | int | None, optional

Z-planes to export (1-based indexing). Options: - None (default) : Export all planes - int : Single plane, e.g. planes=7 exports only plane 7 - list/tuple : Specific planes, e.g. planes=[1, 7, 14]

roiint | Sequence[int] | None, optional

ROI selection for multi-ROI data. Options: - None (default) : Stitch/fuse all ROIs horizontally into single FOV - 0 : Split all ROIs into separate files (one file per ROI per plane) - int > 0 : Export specific ROI, e.g. roi=1 exports only ROI 1 - list/tuple : Export specific ROIs, e.g. roi=[1, 3]

num_framesint, optional

Number of frames to export. If None (default), exports all frames.

register_zbool, default=False

Perform z-plane registration using Suite3D before writing.

shift_vectorsnp.ndarray, optional

Pre-computed z-shift vectors with shape (n_planes, 2) for [dy, dx] shifts.

metadatadict, optional

Additional metadata to merge into output file headers/attributes.

overwritebool, default=False

Whether to overwrite existing output files.

orderlist | tuple, optional

Reorder planes before writing. Must have same length as planes.

target_chunk_mbint, optional

Target chunk size in MB for streaming writes. Default is 100 MB.

progress_callbackCallable, optional

Callback function for progress updates: callback(progress, current_plane).

debugbool, default=False

Enable verbose logging for troubleshooting.

output_namestr, optional

Filename for binary output when ext=”.bin”.

output_suffixstr, optional

Custom suffix to append to output filenames. If None (default), files are named with “_stitched” for multi-ROI data when roi is None, or “_roiN” for specific ROIs. Examples: “_stitched”, “_processed”, “_session1”. The suffix is automatically sanitized (illegal characters removed, double extensions prevented, underscore prefix added if missing).

**kwargs

Additional format-specific options passed to writer backends.

Returns:
Path

Path to the output directory containing written files.

Examples

>>> from mbo_utilities import imread, imwrite
>>> data = imread("path/to/raw/*.tiff")
>>> imwrite(data, "output/session1", roi=None)  # Stitch all ROIs
>>> # Save specific planes
>>> imwrite(data, "output/session1", planes=[1, 7, 14])
>>> # Split ROIs
>>> imwrite(data, "output/session1", roi=0)
>>> # Z-plane registration
>>> imwrite(data, "output/registered", register_z=True)
>>> # Convert to Suite2p binary
>>> imwrite(data, "output/suite2p", ext=".bin", roi=0)
>>> # Save to Zarr
>>> imwrite(data, "output/zarr_store", ext=".zarr")
mbo_utilities.imread(inputs: str | Path | ndarray | Sequence[str | Path], **kwargs)[source]#

Lazy load imaging data from supported file types.

Currently supported file types: - .bin: Suite2p binary files (.bin + ops.npy) - .tif/.tiff: TIFF files (BigTIFF, OME-TIFF and raw ScanImage TIFFs) - .h5: HDF5 files - .zarr: Zarr v3 - .npy: NumPy arrays - np.ndarray: In-memory numpy arrays (wrapped as NumpyArray)

Parameters:
inputsstr, Path, ndarray, or sequence of str/Path

Input source. Can be: - Path to a file or directory - List/tuple of file paths - A numpy array (will be wrapped as NumpyArray for full imwrite support) - An existing lazy array (passed through unchanged)

**kwargs

Extra keyword arguments passed to specific array readers.

Returns:
array_like

One of Suite2pArray, TiffArray, MboRawArray, MBOTiffArray, H5Array, ZarrArray, NumpyArray, or IsoviewArray.

Examples

>>> from mbo_utilities import imread, imwrite
>>> arr = imread("/data/raw")  # directory with supported files
>>> arr = imread("data.tiff")  # single file
>>> arr = imread(["file1.tiff", "file2.tiff"])  # multiple files
>>> # Wrap numpy array for imwrite compatibility
>>> data = np.random.randn(100, 512, 512)
>>> arr = imread(data)  # Returns NumpyArray
>>> imwrite(arr, "output", ext=".zarr")  # Full write support
mbo_utilities.get_files(base_dir, str_contains='', max_depth=1, sort_ascending=True, exclude_dirs=None) list | Path[source]#

Recursively search for files in a specified directory whose names contain a given substring, limiting the search to a maximum subdirectory depth. Optionally, the resulting list of file paths is sorted in ascending order using numeric parts of the filenames when available.

This function intelligently handles zarr stores: it stops recursing into leaf .zarr directories (those that don’t contain nested .zarr subdirs) to avoid traversing thousands of internal chunk directories.

Parameters:
base_dirstr or Path

The base directory where the search begins. This path is expanded (e.g., ‘~’ is resolved) and converted to an absolute path.

str_containsstr, optional

A substring that must be present in a file’s name for it to be included in the result. If empty, all files are matched.

max_depthint, optional

The maximum number of subdirectory levels (relative to the base directory) to search. Defaults to 1. If set to 0, it is automatically reset to 1.

sort_ascendingbool, optional

If True (default), the matched file paths are sorted in ascending alphanumeric order. The sort key extracts numeric parts from filenames so that, for example, “file2” comes before “file10”.

exclude_dirsiterable of str or Path, optional

An iterable of directories to exclude from the resulting list of file paths. By default will exclude “.venv/”, “__pycache__/”, “.git” and “.github”].

Returns:
list of str

A list of full file paths (as strings) for files within the base directory (and its subdirectories up to the specified depth) that contain the provided substring.

Raises:
FileNotFoundError

If the base directory does not exist.

NotADirectoryError

If the specified base_dir is not a directory.

Examples

>>> import mbo_utilities as mbo
>>> # Get all files that contain "ops.npy" in their names by searching up to 3 levels deep:
>>> ops_files = mbo.get_files("path/to/files", "ops.npy", max_depth=3)
>>> # Get only files containing "tif" in the current directory (max_depth=1):
>>> tif_files = mbo.get_files("path/to/files", "tif")
mbo_utilities.files_to_dask(files: list[str | Path], astype=None, chunk_t=250)[source]#

Lazily build a Dask array or list of arrays depending on filename tags.

  • “plane”, “z”, or “chan” → stacked along Z (TZYX)

  • “roi” → list of 3D (T,Y,X) arrays, one per ROI

  • otherwise → concatenate all files in time (T)

mbo_utilities.get_metadata(file, dx: float | None = None, dy: float | None = None, dz: float | None = None, verbose: bool = False, z_step: float | None = None)[source]#

Extract metadata from a TIFF file or directory of TIFF files produced by ScanImage.

This function handles single files, lists of files, or directories containing TIFF files. When given a directory, it automatically finds and processes all TIFF files in natural sort order. For multiple files, it calculates frames per file accounting for z-planes.

Parameters:
fileos.PathLike, str, or list
  • Single file path: processes that file

  • Directory path: processes all TIFF files in the directory

  • List of file paths: processes all files in the list

dxfloat, optional

X pixel resolution in micrometers. Overrides extracted value.

dyfloat, optional

Y pixel resolution in micrometers. Overrides extracted value.

dzfloat, optional

Z step size in micrometers. Overrides extracted value. Also available as z_step for backward compatibility.

verbosebool, optional

If True, returns extended metadata including all ScanImage attributes. Default is False.

z_stepfloat, optional

Alias for dz (backward compatibility).

Returns:
dict

A dictionary containing extracted metadata with normalized resolution aliases: - dx, dy, dz: canonical resolution values in micrometers - pixel_resolution: (dx, dy) tuple - voxel_size: (dx, dy, dz) tuple - umPerPixX, umPerPixY, umPerPixZ: Suite2p format - PhysicalSizeX, PhysicalSizeY, PhysicalSizeZ: OME format

For multiple files, also includes: - ‘frames_per_file’: list of frame counts per file (accounting for z-planes) - ‘total_frames’: total frames across all files - ‘file_paths’: list of processed file paths - ‘tiff_pages_per_file’: raw TIFF page counts per file

Raises:
ValueError

If no recognizable metadata is found or no TIFF files found in directory.

Examples

>>> # Single file with z-resolution
>>> meta = get_metadata("path/to/rawscan_00001.tif", dz=5.0)
>>> print(f"Voxel size: {meta['voxel_size']}")
>>> # Directory of files
>>> meta = get_metadata("path/to/scan_directory/")
>>> print(f"Files processed: {len(meta['file_paths'])}")
>>> print(f"Frames per file: {meta['frames_per_file']}")
>>> # List of specific files
>>> files = ["scan_00001.tif", "scan_00002.tif", "scan_00003.tif"]
>>> meta = get_metadata(files, dz=5.0)
mbo_utilities.expand_paths(paths: str | Path | Sequence[str | Path]) list[Path][source]#

Expand a path, list of paths, or wildcard pattern into a sorted list of actual files.

This is a handy wrapper for loading images or data files when you’ve got a folder, some wildcards, or a mix of both.

Parameters:
pathsstr, Path, or list of (str or Path)

Can be a single path, a wildcard pattern like “*.tif”, a folder, or a list of those.

Returns:
list of Path

Sorted list of full paths to matching files.

Examples

>>> expand_paths("data/\*.tif")
[Path("data/img_000.tif"), Path("data/img_001.tif"), ...]
>>> expand_paths(Path("data"))
[Path("data/img_000.tif"), Path("data/img_001.tif"), ...]
>>> expand_paths(["data/\*.tif", Path("more_data")])
[Path("data/img_000.tif"), Path("more_data/img_050.tif"), ...]
mbo_utilities.get_mbo_dirs() dict[source]#

Ensure ~/mbo and its subdirectories exist.

Returns a dict with paths to the root, settings, and cache directories.