1. I/O#
Functions to help with loading and saving data.
- mbo_utilities.imwrite(lazy_array, outpath: str | Path, ext: str = '.tiff', planes: list | tuple | None = None, num_frames: int | None = None, register_z: bool = False, roi: int | Sequence[int] | None = None, metadata: dict | None = None, overwrite: bool = False, order: list | tuple = None, target_chunk_mb: int = 100, progress_callback: Callable | None = None, debug: bool = False, shift_vectors: ndarray | None = None, output_name: str | None = None, output_suffix: str | None = None, **kwargs)[source]#
Write a supported lazy imaging array to disk.
This function handles writing multi-dimensional imaging data to various formats, with support for ROI selection, z-plane registration, chunked streaming, and format conversion. Use with imread() to load and convert imaging data.
- Parameters:
- lazy_arrayobject
One of the supported lazy array readers providing .shape, .metadata, and _imwrite() methods:
MboRawArray : Raw ScanImage/ScanMultiROI TIFF files with phase correction
Suite2pArray : Memory-mapped binary (data.bin or data_raw.bin) + ops.npy
MBOTiffArray : Multi-file TIFF reader using Dask backend
TiffArray : Single or multi-TIFF reader
H5Array : HDF5 dataset wrapper (h5py.File[dataset])
ZarrArray : Collection of z-plane .zarr stores
NumpyArray : Single .npy memory-mapped NumPy file
NWBArray : NWB file with “TwoPhotonSeries” acquisition dataset
- outpathstr or Path
Target directory to write output files. Will be created if it doesn’t exist. Files are named automatically based on plane/ROI (e.g., plane01_roi1.tiff).
- extstr, default=”.tiff”
Output format extension. Supported formats: - .tiff, .tif : Multi-page TIFF (BigTIFF for >4GB) - .bin : Suite2p-compatible binary format with ops.npy metadata - .zarr : Zarr v3 array store - .h5, .hdf5 : HDF5 format
- planeslist | tuple | int | None, optional
Z-planes to export (1-based indexing). Options: - None (default) : Export all planes - int : Single plane, e.g. planes=7 exports only plane 7 - list/tuple : Specific planes, e.g. planes=[1, 7, 14]
- roiint | Sequence[int] | None, optional
ROI selection for multi-ROI data. Options: - None (default) : Stitch/fuse all ROIs horizontally into single FOV - 0 : Split all ROIs into separate files (one file per ROI per plane) - int > 0 : Export specific ROI, e.g. roi=1 exports only ROI 1 - list/tuple : Export specific ROIs, e.g. roi=[1, 3]
- num_framesint, optional
Number of frames to export. If None (default), exports all frames.
- register_zbool, default=False
Perform z-plane registration using Suite3D before writing.
- shift_vectorsnp.ndarray, optional
Pre-computed z-shift vectors with shape (n_planes, 2) for [dy, dx] shifts.
- metadatadict, optional
Additional metadata to merge into output file headers/attributes.
- overwritebool, default=False
Whether to overwrite existing output files.
- orderlist | tuple, optional
Reorder planes before writing. Must have same length as planes.
- target_chunk_mbint, optional
Target chunk size in MB for streaming writes. Default is 100 MB.
- progress_callbackCallable, optional
Callback function for progress updates: callback(progress, current_plane).
- debugbool, default=False
Enable verbose logging for troubleshooting.
- output_namestr, optional
Filename for binary output when ext=”.bin”.
- output_suffixstr, optional
Custom suffix to append to output filenames. If None (default), files are named with “_stitched” for multi-ROI data when roi is None, or “_roiN” for specific ROIs. Examples: “_stitched”, “_processed”, “_session1”. The suffix is automatically sanitized (illegal characters removed, double extensions prevented, underscore prefix added if missing).
- **kwargs
Additional format-specific options passed to writer backends.
- Returns:
- Path
Path to the output directory containing written files.
Examples
>>> from mbo_utilities import imread, imwrite >>> data = imread("path/to/raw/*.tiff") >>> imwrite(data, "output/session1", roi=None) # Stitch all ROIs
>>> # Save specific planes >>> imwrite(data, "output/session1", planes=[1, 7, 14])
>>> # Split ROIs >>> imwrite(data, "output/session1", roi=0)
>>> # Z-plane registration >>> imwrite(data, "output/registered", register_z=True)
>>> # Convert to Suite2p binary >>> imwrite(data, "output/suite2p", ext=".bin", roi=0)
>>> # Save to Zarr >>> imwrite(data, "output/zarr_store", ext=".zarr")
- mbo_utilities.imread(inputs: str | Path | ndarray | Sequence[str | Path], **kwargs)[source]#
Lazy load imaging data from supported file types.
Currently supported file types: - .bin: Suite2p binary files (.bin + ops.npy) - .tif/.tiff: TIFF files (BigTIFF, OME-TIFF and raw ScanImage TIFFs) - .h5: HDF5 files - .zarr: Zarr v3 - .npy: NumPy arrays - np.ndarray: In-memory numpy arrays (wrapped as NumpyArray)
- Parameters:
- inputsstr, Path, ndarray, or sequence of str/Path
Input source. Can be: - Path to a file or directory - List/tuple of file paths - A numpy array (will be wrapped as NumpyArray for full imwrite support) - An existing lazy array (passed through unchanged)
- **kwargs
Extra keyword arguments passed to specific array readers.
- Returns:
- array_like
One of Suite2pArray, TiffArray, MboRawArray, MBOTiffArray, H5Array, ZarrArray, NumpyArray, or IsoviewArray.
Examples
>>> from mbo_utilities import imread, imwrite >>> arr = imread("/data/raw") # directory with supported files >>> arr = imread("data.tiff") # single file >>> arr = imread(["file1.tiff", "file2.tiff"]) # multiple files
>>> # Wrap numpy array for imwrite compatibility >>> data = np.random.randn(100, 512, 512) >>> arr = imread(data) # Returns NumpyArray >>> imwrite(arr, "output", ext=".zarr") # Full write support
- mbo_utilities.get_files(base_dir, str_contains='', max_depth=1, sort_ascending=True, exclude_dirs=None) list | Path[source]#
Recursively search for files in a specified directory whose names contain a given substring, limiting the search to a maximum subdirectory depth. Optionally, the resulting list of file paths is sorted in ascending order using numeric parts of the filenames when available.
This function intelligently handles zarr stores: it stops recursing into leaf .zarr directories (those that don’t contain nested .zarr subdirs) to avoid traversing thousands of internal chunk directories.
- Parameters:
- base_dirstr or Path
The base directory where the search begins. This path is expanded (e.g., ‘~’ is resolved) and converted to an absolute path.
- str_containsstr, optional
A substring that must be present in a file’s name for it to be included in the result. If empty, all files are matched.
- max_depthint, optional
The maximum number of subdirectory levels (relative to the base directory) to search. Defaults to 1. If set to 0, it is automatically reset to 1.
- sort_ascendingbool, optional
If True (default), the matched file paths are sorted in ascending alphanumeric order. The sort key extracts numeric parts from filenames so that, for example, “file2” comes before “file10”.
- exclude_dirsiterable of str or Path, optional
An iterable of directories to exclude from the resulting list of file paths. By default will exclude “.venv/”, “__pycache__/”, “.git” and “.github”].
- Returns:
- list of str
A list of full file paths (as strings) for files within the base directory (and its subdirectories up to the specified depth) that contain the provided substring.
- Raises:
- FileNotFoundError
If the base directory does not exist.
- NotADirectoryError
If the specified base_dir is not a directory.
Examples
>>> import mbo_utilities as mbo >>> # Get all files that contain "ops.npy" in their names by searching up to 3 levels deep: >>> ops_files = mbo.get_files("path/to/files", "ops.npy", max_depth=3) >>> # Get only files containing "tif" in the current directory (max_depth=1): >>> tif_files = mbo.get_files("path/to/files", "tif")
- mbo_utilities.files_to_dask(files: list[str | Path], astype=None, chunk_t=250)[source]#
Lazily build a Dask array or list of arrays depending on filename tags.
“plane”, “z”, or “chan” → stacked along Z (TZYX)
“roi” → list of 3D (T,Y,X) arrays, one per ROI
otherwise → concatenate all files in time (T)
- mbo_utilities.get_metadata(file, dx: float | None = None, dy: float | None = None, dz: float | None = None, verbose: bool = False, z_step: float | None = None)[source]#
Extract metadata from a TIFF file or directory of TIFF files produced by ScanImage.
This function handles single files, lists of files, or directories containing TIFF files. When given a directory, it automatically finds and processes all TIFF files in natural sort order. For multiple files, it calculates frames per file accounting for z-planes.
- Parameters:
- fileos.PathLike, str, or list
Single file path: processes that file
Directory path: processes all TIFF files in the directory
List of file paths: processes all files in the list
- dxfloat, optional
X pixel resolution in micrometers. Overrides extracted value.
- dyfloat, optional
Y pixel resolution in micrometers. Overrides extracted value.
- dzfloat, optional
Z step size in micrometers. Overrides extracted value. Also available as
z_stepfor backward compatibility.- verbosebool, optional
If True, returns extended metadata including all ScanImage attributes. Default is False.
- z_stepfloat, optional
Alias for
dz(backward compatibility).
- Returns:
- dict
A dictionary containing extracted metadata with normalized resolution aliases: - dx, dy, dz: canonical resolution values in micrometers - pixel_resolution: (dx, dy) tuple - voxel_size: (dx, dy, dz) tuple - umPerPixX, umPerPixY, umPerPixZ: Suite2p format - PhysicalSizeX, PhysicalSizeY, PhysicalSizeZ: OME format
For multiple files, also includes: - ‘frames_per_file’: list of frame counts per file (accounting for z-planes) - ‘total_frames’: total frames across all files - ‘file_paths’: list of processed file paths - ‘tiff_pages_per_file’: raw TIFF page counts per file
- Raises:
- ValueError
If no recognizable metadata is found or no TIFF files found in directory.
Examples
>>> # Single file with z-resolution >>> meta = get_metadata("path/to/rawscan_00001.tif", dz=5.0) >>> print(f"Voxel size: {meta['voxel_size']}")
>>> # Directory of files >>> meta = get_metadata("path/to/scan_directory/") >>> print(f"Files processed: {len(meta['file_paths'])}") >>> print(f"Frames per file: {meta['frames_per_file']}")
>>> # List of specific files >>> files = ["scan_00001.tif", "scan_00002.tif", "scan_00003.tif"] >>> meta = get_metadata(files, dz=5.0)
- mbo_utilities.expand_paths(paths: str | Path | Sequence[str | Path]) list[Path][source]#
Expand a path, list of paths, or wildcard pattern into a sorted list of actual files.
This is a handy wrapper for loading images or data files when you’ve got a folder, some wildcards, or a mix of both.
- Parameters:
- pathsstr, Path, or list of (str or Path)
Can be a single path, a wildcard pattern like “*.tif”, a folder, or a list of those.
- Returns:
- list of Path
Sorted list of full paths to matching files.
Examples
>>> expand_paths("data/\*.tif") [Path("data/img_000.tif"), Path("data/img_001.tif"), ...]
>>> expand_paths(Path("data")) [Path("data/img_000.tif"), Path("data/img_001.tif"), ...]
>>> expand_paths(["data/\*.tif", Path("more_data")]) [Path("data/img_000.tif"), Path("more_data/img_050.tif"), ...]