hsimars package

HSI Mars: Hyperspectral Image Analysis for Mars Reconnaissance Orbiter Data

This package provides tools for loading, processing, and visualising hyperspectral imaging (HSI) data from the CRISM instrument aboard NASA’s Mars Reconnaissance Orbiter (MRO). It works with ENVI format spectral data and ground truth annotations.

Main Features

  • Load and process CRISM hyperspectral images in ENVI format

  • Handle ground truth annotations for supervised learning

  • Visualise false-colour images and spectral signatures

  • Plot spectral profiles with optional convex hull removal

  • Generate histograms for spectral band analysis

  • Memory-efficient lazy loading for large datasets

Quick Start

>>> from hsimars import HSIMars
>>> # Load hyperspectral image
>>> hsi = HSIMars(hdr_path="path/to/image.hdr")
>>> img_data = hsi.get_img()
>>> print(f"Image shape: {img_data.shape}")
>>>
>>> # Load with annotations
>>> hsi = HSIMars(
...     hdr_path="path/to/image.hdr", annotations="path/to/labels.mat"
... )
>>> img_data, ann_data = hsi.data()
>>>
>>> # Visualise
>>> hsi.display()  # Interactive display
>>> hsi.plot_spectra(px=[100, 200], convex_hull=True, bands=True)

Classes

HSIMars

Main class for loading and manipulating hyperspectral images and annotations.

Modules

hsi

Core module with the HSIMars class and processing functions.

Package Information

class hsimars.HSIMars(hdr_path: str | Path, annotations: str | Path | None = None, label_names_path: str | Path | None = None)[source]

Bases: object

Load and manipulate hyperspectral images from Mars Reconnaissance Orbiter.

This class provides functionality for working with CRISM hyperspectral data: loading ENVI format images, processing spectral data, handling ground truth annotations, and creating visualisations.

The class uses lazy loading for memory efficiency - data loads from disk only when first accessed, then caches for later use.

hdr_path

Path to the ENVI header file (.hdr extension).

Type:

Path

annotations

Path to the ground truth annotations file (.mat format), if provided.

Type:

Path or None

Examples

>>> # Load HSI data without annotations
>>> hsi = HSIMars(hdr_path="data/sample.hdr")
>>> img_data = hsi.get_img()
>>> print(
...     f"Image dimensions: {img_data.height}x{img_data.width}, {img_data.channels} channels"
... )
>>> # Load HSI data with annotations
>>> hsi = HSIMars(
...     hdr_path="data/sample.hdr", annotations="data/labels.mat"
... )
>>> img_data, ann_data = hsi.data()
>>> hsi.display()  # Show image with overlaid annotations
>>> # Plot spectrum for a specific pixel
>>> hsi.plot_spectra(px=[100, 200], convex_hull=True, bands=True)

Notes

The ENVI .img file containing the hyperspectral data must be in the same directory as the .hdr header file.

Initialise the HSIMars object with paths to data files.

Parameters:
  • hdr_path (str | Path) – Path to the ENVI header file (.hdr extension) with metadata about the hyperspectral image. The corresponding .img file with spectral data must be in the same directory.

  • annotations (str | Path, optional) – Path to the ground truth annotations file (.mat format). If None, annotation methods return None. Default is None.

  • label_names_path (str | Path, optional) – Path to the Excel file with label name mappings. If None, looks for data_description.xlsx in the same directory as the HDR file. Default is None.

Raises:
  • FileNotFoundError – If the header file does not exist.

  • FileNotFoundError – If annotations path is provided but the file does not exist.

  • FileNotFoundError – If label_names_path is provided but the file does not exist.

Examples

>>> hsi = HSIMars(hdr_path="path/to/image.hdr")
>>> hsi_with_labels = HSIMars(
...     hdr_path="path/to/image.hdr", annotations="path/to/labels.mat"
... )
>>> hsi_custom_labels = HSIMars(
...     hdr_path="path/to/image.hdr",
...     annotations="path/to/labels.mat",
...     label_names_path="path/to/custom_labels.xlsx",
... )

Notes

The constructor validates file paths only - no data loads until a get_* method is called. This lazy loading reduces memory usage with large datasets.

data() tuple[NamedTuple, NamedTuple | None][source]

Load both hyperspectral image and annotation data.

This convenience method loads both the HSI data and annotations (if available) in a single call, ensuring both are cached for subsequent operations.

Returns:

A tuple containing two elements:

  1. HSIMarsImageData (NamedTuple):

    • hsindarray

      The HSI data array of shape (height, width, channels).

    • wavelengthndarray

      Array of wavelength values in nm.

    • shapetuple

      Dimensions (height, width, channels).

    • heightint

      Number of pixel rows.

    • widthint

      Number of pixel columns.

    • channelsint

      Number of spectral bands.

    • dtypestr

      Data type (‘float32’).

  2. HSIMarsAnnotationData (NamedTuple or None):

    If annotations are available:

    • labelsndarray

      Label array of shape (height, width).

    • shapetuple

      Dimensions (height, width).

    • heightint

      Number of pixel rows.

    • widthint

      Number of pixel columns.

    • dtypestr

      Data type (‘uint8’).

    • label_namesdict[int, str]

      Dictionary mapping numerical class labels to human-readable class names.

    Returns None if no annotations were provided.

Return type:

tuple[NamedTuple, NamedTuple or None]

Examples

>>> hsi = HSIMars(
...     hdr_path="data/sample.hdr", annotations="data/labels.mat"
... )
>>> img_data, ann_data = hsi.data()
>>> print(
...     f"Image: {img_data.shape}, Annotations: {ann_data.shape if ann_data else 'None'}"
... )

Notes

This method is equivalent to calling get_img() and get_annotations() separately, but provides a more convenient interface when both datasets are needed.

display() None[source]

Display comprehensive visualization with HSI, annotations, and overlay.

Opens an interactive OpenCV window showing: - Left panel: False-color HSI visualization - Middle panel: Color-coded annotations (if available) - Right panel: Semi-transparent overlay of annotations on HSI (if available)

If no annotations are provided, displays only the HSI.

Examples

>>> # With annotations
>>> hsi = HSIMars(
...     hdr_path="data/sample.hdr", annotations="data/labels.mat"
... )
>>> hsi.display()  # Shows three-panel view
>>> # Without annotations
>>> hsi = HSIMars(hdr_path="data/sample.hdr")
>>> hsi.display()  # Shows only HSI

Notes

The overlay uses 75% weight for the HSI and 25% weight for the annotations, providing a good balance between seeing spectral features and class boundaries.

display_annotations() None[source]

Display the ground truth annotations with color-coded labels.

Opens an interactive OpenCV window showing the annotation labels with a colormap applied for visual distinction between classes. Background pixels (label 0) are displayed in black.

Raises:

AttributeError – If no annotations were provided during initialization.

Examples

>>> hsi = HSIMars(
...     hdr_path="data/sample.hdr", annotations="data/labels.mat"
... )
>>> hsi.display_annotations()  # Opens window, press any key to close

Notes

The TURBO colormap is applied to provide maximum visual distinction between different material classes in the annotations.

display_hsi() None[source]

Display the hyperspectral image as a false-color RGB visualization.

Opens an interactive OpenCV window showing the HSI data rendered using three representative spectral bands. The window can be resized and closed by pressing any key.

Examples

>>> hsi = HSIMars(hdr_path="data/sample.hdr")
>>> hsi.display_hsi()  # Opens window, press any key to close

Notes

The false-color bands are automatically selected from the ENVI header metadata, typically representing visible and near-infrared wavelengths for optimal visual interpretation.

get_annotations() NamedTuple | None[source]

Load and process ground truth annotation data.

Loads annotation labels from a MATLAB .mat file and aligns them with the processed HSI data dimensions. The result is cached for efficient subsequent access.

Returns:

If annotations were provided during initialization, returns a named tuple (HSIMarsAnnotationData) with the following attributes:

  • labelsndarray of shape (height, width)

    2D array containing class labels for each pixel. Values are unsigned integers representing different material classes. A value of 0 typically indicates unlabeled/background pixels.

  • shapetuple of (height, width)

    Dimensions of the annotation matrix.

  • heightint

    Number of pixel rows (matches HSI height).

  • widthint

    Number of pixel columns (matches HSI width).

  • dtypestr

    Data type of the labels array (‘uint8’).

  • label_namesdict[int, str]

    Dictionary mapping numerical class labels to human-readable class names. For example: {1: ‘Analcime’, 2: ‘Plagioclase’}. Will be an empty dictionary if the label mapping file is not found or cannot be parsed.

Returns None if no annotation file was provided during initialization.

Return type:

NamedTuple or None

Examples

>>> hsi = HSIMars(
...     hdr_path="data/sample.hdr", annotations="data/labels.mat"
... )
>>> ann_data = hsi.get_annotations()
>>> if ann_data is not None:
...     print(f"Annotation shape: {ann_data.shape}")
...     unique_labels = np.unique(ann_data.labels)
...     print(f"Number of classes: {len(unique_labels)}")

Notes

The annotation matrix is automatically padded to match the dimensions of the processed HSI data. This ensures pixel-level alignment between spectral data and labels.

The method implements lazy evaluation - annotations are only loaded on the first call. Subsequent calls return the cached result.

get_img() NamedTuple[source]

Load and process the hyperspectral image data.

This method loads the raw HSI data, performs preprocessing steps including bad band removal, cropping of invalid pixels, and normalization. The result is cached for efficient subsequent access.

The preprocessing pipeline includes: 1. Loading wavelength information and identifying bad bands 2. Removing pixels with the ignore value (65535) 3. Cropping the image to remove rows/columns with no valid data 4. Removing any remaining bad channels 5. Converting to float32 format for numerical processing

Returns:

A named tuple (HSIMarsImageData) containing the following attributes:

  • hsindarray of shape (height, width, channels)

    The processed hyperspectral image data as a 3D numpy array. Data type is float32.

  • wavelengthndarray of shape (channels,)

    Array of wavelength values in nanometers corresponding to each spectral channel.

  • shapetuple of (height, width, channels)

    Dimensions of the hyperspectral image.

  • heightint

    Number of pixel rows in the image.

  • widthint

    Number of pixel columns in the image.

  • channelsint

    Number of spectral bands/channels in the image.

  • dtypestr

    Data type of the HSI array (‘float32’).

Return type:

NamedTuple

Examples

>>> hsi = HSIMars(hdr_path="data/sample.hdr")
>>> img_data = hsi.get_img()
>>> print(f"Image shape: {img_data.shape}")
>>> print(
...     f"Wavelength range: {img_data.wavelength.min():.1f} - {img_data.wavelength.max():.1f} nm"
... )
>>> print(f"Data type: {img_data.dtype}")
>>> # Access specific pixel spectrum
>>> spectrum = img_data.hsi[100, 200, :]
>>> print(f"Spectrum at (100, 200): {spectrum.shape[0]} bands")

Notes

The method implements lazy evaluation - the image is only loaded and processed on the first call. Subsequent calls return the cached result.

The bad band list (bbl) from the ENVI metadata is used to filter out unreliable spectral channels before further processing.

get_raw() BsqFile[source]

Load the raw ENVI hyperspectral data file.

This method opens the ENVI format file and returns a file object for accessing the raw spectral data. The data caches after the first call to avoid redundant disk I/O.

Returns:

ENVI file object for accessing the hyperspectral data. This object supports memory-mapped access for large files.

Return type:

envi.BsqFile

Examples

>>> hsi = HSIMars(hdr_path="data/sample.hdr")
>>> raw = hsi.get_raw()
>>> print(raw.metadata["wavelength"])  # Access wavelength information

Notes

The spectral data is in a .img file that must be in the same directory as the .hdr file. The ENVI library automatically finds and opens the .img file.

plot_histogram(band: int | float, output: str | Path | None = None) None[source]

Plot the intensity distribution histogram for a specific spectral band.

Generates a probability density histogram showing the distribution of pixel intensity values across the image for the selected wavelength band. Useful for analyzing the statistical properties of spectral features.

Parameters:
  • band (int | float) –

    Spectral band selector. Can be:

    • int: Direct band index (0-based) in the spectral dimension.

    • float: Wavelength in nanometers. The closest available wavelength will be automatically selected.

  • output (str | Path, optional) – Path to save the histogram plot as an image file. If None (default), displays the plot interactively using matplotlib’s show(). The directory will be created if it doesn’t exist.

Examples

>>> hsi = HSIMars(hdr_path="data/sample.hdr")
>>> # Plot histogram for band at index 100
>>> hsi.plot_histogram(band=100)
>>> # Plot histogram for band nearest to 1500 nm wavelength
>>> hsi.plot_histogram(band=1500.0)
>>> # Save histogram to file
>>> hsi.plot_histogram(
...     band=1500.0, output="plots/histogram_1500nm.png"
... )

Notes

The histogram uses 100 bins and is normalized to show probability density rather than raw counts. This normalization facilitates comparison between different bands or images.

The histogram includes all valid pixels in the image for the selected band, providing a global view of intensity distribution.

plot_spectra(px: list[int, int] | list[list[int, int]] | ndarray[tuple[Any, ...], dtype[_ScalarT]], convex_hull: bool = False, bands: bool = False, output: str | Path | None = None) None[source]

Plot spectral signature(s) for specified pixel location(s).

Generates a line plot showing reflectance/intensity as a function of wavelength. For multiple pixels, plots the mean spectrum with standard deviation shading. Optionally applies convex hull removal to normalize continuum and highlights spectral band regions.

Parameters:
  • px (list[int, int] | list[list[int, int]] | NDArray) –

    Pixel coordinates to extract spectra from. Can be:

    • Single pixel: [row, col] or [[row, col]]

    • Multiple pixels: [[row1, col1], [row2, col2], …] or 2D array

    Coordinates are in (row, column) format, 0-indexed.

  • convex_hull (bool, optional) – If True, applies convex hull removal to normalize the spectrum continuum. This technique is useful for analyzing absorption features by removing the overall spectral shape. Default is False.

  • bands (bool, optional) – If True, overlays colored regions indicating spectral band types: - VIS (Visible): < 750 nm (green) - NIR (Near-Infrared): 750-1400 nm (red) - SWIR (Short-Wave Infrared): 1400-3000 nm (blue) - MWIR (Mid-Wave Infrared): > 3000 nm (magenta) Default is False.

  • output (str | Path, optional) – Path to save the plot as an image file. If None (default), displays the plot interactively using matplotlib’s show(). The directory will be created if it doesn’t exist.

Examples

>>> hsi = HSIMars(hdr_path="data/sample.hdr")
>>> # Plot single pixel spectrum
>>> hsi.plot_spectra(px=[100, 200])
>>> # Plot average spectrum of multiple pixels with standard deviation
>>> pixels = [[100, 200], [101, 200], [100, 201], [101, 201]]
>>> hsi.plot_spectra(px=pixels, convex_hull=True, bands=True)
>>> # Save plot to file
>>> hsi.plot_spectra(px=[100, 200], output="plots/spectrum.png")

Notes

Convex hull removal is performed using the pysptools library. This technique divides the spectrum by its convex hull envelope, effectively normalizing the continuum and emphasizing absorption features.

For multiple pixels, the standard deviation is shown as a shaded region around the mean spectrum, providing visual indication of spectral variability within the selected region.