1. I/O#

Functions to help with loading and saving data.

mbo_utilities.imwrite(lazy_array, outpath: str | Path, ext: str = '.tiff', planes: list | tuple | None = None, num_frames: int | None = None, register_z: bool = False, roi_mode: RoiMode | str = RoiMode.concat_y, roi: int | Sequence[int] | None = None, metadata: dict | None = None, overwrite: bool = False, order: list | tuple | None = None, target_chunk_mb: int = 100, progress_callback: Callable | None = None, debug: bool = False, show_progress: bool = True, shift_vectors: np.ndarray | None = None, output_name: str | None = None, output_suffix: str | None = None, **kwargs)[source]#

Write a supported lazy imaging array to disk.

This function handles writing multi-dimensional imaging data to various formats, with support for ROI selection, z-plane registration, chunked streaming, and format conversion. Use with imread() to load and convert imaging data.

Parameters:
lazy_arrayobject

A lazy array from imread() or a numpy array. Any object with .shape, .dtype, and _imwrite() method is supported. Use mbo formats CLI command to list all supported input formats.

outpathstr or Path

Target directory to write output files. Will be created if it doesn’t exist. Files are named automatically based on plane/ROI (e.g., plane01_roi1.tiff).

extstr, default=”.tiff”

Output format extension. Supported formats: - .tiff, .tif : Multi-page TIFF (BigTIFF for >4GB) - .bin : Suite2p-compatible binary format with ops.npy metadata - .zarr : Zarr v3 array store - .h5, .hdf5 : HDF5 format

planeslist | tuple | int | None, optional

Z-planes to export (1-based indexing). Options: - None (default) : Export all planes - int : Single plane, e.g. planes=7 exports only plane 7 - list/tuple : Specific planes, e.g. planes=[1, 7, 14]

roi_modeRoiMode | str, default=RoiMode.concat_y

Mode for handling multi-ROI data. Options: - RoiMode.concat_y : Horizontally concatenate ROIs into single FOV (default) - RoiMode.separate : Write each ROI to separate files String values are accepted (case-insensitive): “concat_y”, “separate”.

roiint | Sequence[int] | None, optional

Specific ROI(s) to export when roi_mode=RoiMode.separate. Options: - None (default) : Export all ROIs - int > 0 : Export specific ROI, e.g. roi=1 exports only ROI 1 - list/tuple : Export specific ROIs, e.g. roi=[1, 3] Note: When roi_mode=RoiMode.concat_y, this parameter is ignored.

num_framesint, optional

Number of frames to export. If None (default), exports all frames.

register_zbool, default=False

Perform z-plane registration using Suite3D before writing.

shift_vectorsnp.ndarray, optional

Pre-computed z-shift vectors with shape (n_planes, 2) for [dy, dx] shifts.

metadatadict, optional

Additional metadata to merge into output file headers/attributes.

overwritebool, default=False

Whether to overwrite existing output files.

orderlist | tuple, optional

Reorder planes before writing. Must have same length as planes.

target_chunk_mbint, optional

Target chunk size in MB for streaming writes. Default is 100 MB.

progress_callbackCallable, optional

Callback function for progress updates: callback(progress, current_plane).

debugbool, default=False

Enable verbose logging for troubleshooting.

show_progressbool, default=True

Show tqdm progress bar during writing. Set to False in notebooks when you don’t want progress output cluttering the display.

output_namestr, optional

Filename for binary output when ext=”.bin”.

output_suffixstr, optional

Custom suffix to append to output filenames. If None (default), files are named with “_stitched” for multi-ROI data when roi is None, or “_roiN” for specific ROIs. Examples: “_stitched”, “_processed”, “_session1”. The suffix is automatically sanitized (illegal characters removed, double extensions prevented, underscore prefix added if missing).

**kwargs

Additional format-specific options passed to writer backends.

Returns:
Path

Path to the output directory containing written files.

Examples

>>> from mbo_utilities import imread, imwrite
>>> data = imread("path/to/raw/*.tiff")
>>> imwrite(data, "output/session1", roi=None)  # Stitch all ROIs
>>> # Save specific planes
>>> imwrite(data, "output/session1", planes=[1, 7, 14])
>>> # Split ROIs
>>> imwrite(data, "output/session1", roi=0)
>>> # Z-plane registration
>>> imwrite(data, "output/registered", register_z=True)
>>> # Convert to Suite2p binary
>>> imwrite(data, "output/suite2p", ext=".bin", roi=0)
>>> # Save to Zarr
>>> imwrite(data, "output/zarr_store", ext=".zarr")
mbo_utilities.imread(inputs: str | Path | np.ndarray | Sequence[str | Path], **kwargs)[source]#

Lazy load imaging data from supported file types.

Currently supported file types: - .bin: Suite2p binary files (.bin + ops.npy) - .tif/.tiff: TIFF files (BigTIFF, OME-TIFF and raw ScanImage TIFFs) - .h5: HDF5 files - .zarr: Zarr v3 - .npy: NumPy arrays - np.ndarray: In-memory numpy arrays (wrapped as NumpyArray)

Parameters:
inputsstr, Path, ndarray, or sequence of str/Path

Input source. Can be: - Path to a file or directory - List/tuple of file paths - A numpy array (will be wrapped as NumpyArray for full imwrite support) - An existing lazy array (passed through unchanged)

**kwargs

Extra keyword arguments passed to specific array readers.

Returns:
array_like

A lazy array appropriate for the input format. Use mbo formats CLI command to list all supported formats and their array types.

Examples

>>> from mbo_utilities import imread, imwrite
>>> arr = imread("/data/raw")  # directory with supported files
>>> arr = imread("data.tiff")  # single file
>>> arr = imread(["file1.tiff", "file2.tiff"])  # multiple files
>>> # Wrap numpy array for imwrite compatibility
>>> data = np.random.randn(100, 512, 512)
>>> arr = imread(data)  # Returns NumpyArray
>>> imwrite(arr, "output", ext=".zarr")  # Full write support
mbo_utilities.get_files(base_dir, str_contains='', max_depth=1, sort_ascending=True, exclude_dirs=None) list | Path[source]#

Recursively search for files in a specified directory whose names contain a given substring, limiting the search to a maximum subdirectory depth. Optionally, the resulting list of file paths is sorted in ascending order using numeric parts of the filenames when available.

This function intelligently handles zarr stores: it stops recursing into leaf .zarr directories (those that don’t contain nested .zarr subdirs) to avoid traversing thousands of internal chunk directories.

Parameters:
base_dirstr or Path

The base directory where the search begins. This path is expanded (e.g., ‘~’ is resolved) and converted to an absolute path.

str_containsstr, optional

A substring that must be present in a file’s name for it to be included in the result. If empty, all files are matched.

max_depthint, optional

The maximum number of subdirectory levels (relative to the base directory) to search. Defaults to 1. If set to 0, it is automatically reset to 1.

sort_ascendingbool, optional

If True (default), the matched file paths are sorted in ascending alphanumeric order. The sort key extracts numeric parts from filenames so that, for example, “file2” comes before “file10”.

exclude_dirsiterable of str or Path, optional

An iterable of directories to exclude from the resulting list of file paths. By default will exclude “.venv/”, “__pycache__/”, “.git” and “.github”].

Returns:
list of str

A list of full file paths (as strings) for files within the base directory (and its subdirectories up to the specified depth) that contain the provided substring.

Raises:
FileNotFoundError

If the base directory does not exist.

NotADirectoryError

If the specified base_dir is not a directory.

Examples

>>> import mbo_utilities as mbo
>>> # Get all files that contain "ops.npy" in their names by searching up to 3 levels deep:
>>> ops_files = mbo.get_files("path/to/files", "ops.npy", max_depth=3)
>>> # Get only files containing "tif" in the current directory (max_depth=1):
>>> tif_files = mbo.get_files("path/to/files", "tif")
mbo_utilities.files_to_dask(files: list[str | Path], astype=None, chunk_t=250)[source]#

Lazily build a Dask array or list of arrays depending on filename tags.

  • “plane”, “z”, or “chan” → stacked along Z (TZYX)

  • “roi” → list of 3D (T,Y,X) arrays, one per ROI

  • otherwise → concatenate all files in time (T)

mbo_utilities.get_metadata(file, dx: float | None = None, dy: float | None = None, dz: float | None = None, verbose: bool = False, z_step: float | None = None)[source]#

Extract metadata from a TIFF file or directory of TIFF files produced by ScanImage.

This function handles single files, lists of files, or directories containing TIFF files. When given a directory, it automatically finds and processes all TIFF files in natural sort order. For multiple files, it calculates frames per file accounting for z-planes.

Parameters:
fileos.PathLike, str, or list
  • Single file path: processes that file

  • Directory path: processes all TIFF files in the directory

  • List of file paths: processes all files in the list

dxfloat, optional

X pixel resolution in micrometers. Overrides extracted value.

dyfloat, optional

Y pixel resolution in micrometers. Overrides extracted value.

dzfloat, optional

Z step size in micrometers. Overrides extracted value. Also available as z_step for backward compatibility.

verbosebool, optional

If True, returns extended metadata including all ScanImage attributes. Default is False.

z_stepfloat, optional

Alias for dz (backward compatibility).

Returns:
dict

A dictionary containing extracted metadata with normalized resolution aliases: - dx, dy, dz: canonical resolution values in micrometers - pixel_resolution: (dx, dy) tuple - voxel_size: (dx, dy, dz) tuple - umPerPixX, umPerPixY, umPerPixZ: legacy format - PhysicalSizeX, PhysicalSizeY, PhysicalSizeZ: OME format

For multiple files, also includes: - ‘frames_per_file’: list of frame counts per file (accounting for z-planes) - ‘total_frames’: total frames across all files - ‘file_paths’: list of processed file paths - ‘tiff_pages_per_file’: raw TIFF page counts per file

Raises:
ValueError

If no recognizable metadata is found or no TIFF files found in directory.

Examples

>>> # Single file with z-resolution
>>> meta = get_metadata("path/to/rawscan_00001.tif", dz=5.0)
>>> print(f"Voxel size: {meta['voxel_size']}")
>>> # Directory of files
>>> meta = get_metadata("path/to/scan_directory/")
>>> print(f"Files processed: {len(meta['file_paths'])}")
>>> print(f"Frames per file: {meta['frames_per_file']}")
>>> # List of specific files
>>> files = ["scan_00001.tif", "scan_00002.tif", "scan_00003.tif"]
>>> meta = get_metadata(files, dz=5.0)
mbo_utilities.expand_paths(paths: str | Path | Sequence[str | Path]) list[Path][source]#

Expand a path, list of paths, or wildcard pattern into a sorted list of actual files.

This is a handy wrapper for loading images or data files when you’ve got a folder, some wildcards, or a mix of both.

Parameters:
pathsstr, Path, or list of (str or Path)

Can be a single path, a wildcard pattern like “\*.tif”, a folder, or a list of those.

Returns:
list of Path

Sorted list of full paths to matching files.

Examples

>>> expand_paths("data/\\*.tif")
[Path("data/img_000.tif"), Path("data/img_001.tif"), ...]
>>> expand_paths(Path("data"))
[Path("data/img_000.tif"), Path("data/img_001.tif"), ...]
>>> expand_paths(["data/\\*.tif", Path("more_data")])
[Path("data/img_000.tif"), Path("more_data/img_050.tif"), ...]
mbo_utilities.get_mbo_dirs() dict[source]#

Ensure ~/mbo and its subdirectories exist.

Returns a dict with paths to the root, settings, and cache directories.

mbo_utilities.load_ops(ops_input: str | Path | list[str | Path])[source]#

Simple utility to load a suite2p npy file.

mbo_utilities.write_ops(metadata, raw_filename, **kwargs)[source]#

Write metadata to an ops file alongside the given filename.

This creates a Suite2p-compatible ops.npy file from the provided metadata. The ops file is used by Suite2p for processing configuration.

Parameters:
metadatadict

Must contain ‘shape’ key with (T, Y, X) dimensions. Optional keys: ‘pixel_resolution’, ‘frame_rate’, ‘fs’, ‘dx’, ‘dy’, ‘dz’.

raw_filenamestr or Path

Path to the data file (e.g., data_raw.bin). The ops.npy will be written to the same directory.

**kwargs

Additional arguments. ‘structural=True’ indicates channel 2 data.