User Guide#

Installation#

hyperspy-ml-algorithms is available on PyPI:

pip install hyperspy-ml-algorithms

To install with GPU support (optional):

pip install hyperspy-ml-algorithms[gpu]

To install with scikit-learn support (optional, enables randomized SVD):

pip install hyperspy-ml-algorithms[sklearn]

Quick Start#

All estimators follow a scikit-learn-compatible API with fit, transform, and fit_transform methods:

import numpy as np
from hyperspy_ml_algorithms import SVDPCA

rng = np.random.RandomState(42)
data = rng.random((77, 13))   # 77 samples, 13 features

est = SVDPCA(n_components=5)
est.fit(data)

print(est.components_.shape)  # (5, 13) — rows are components
scores = est.transform(data)
print(scores.shape)           # (77, 5) — rows are samples

Estimator Overview#

The package provides 8 estimators covering a range of decomposition and transformation techniques:

Estimator

Type

Key Feature

SVDPCA

SVD-based PCA

Multi-backend SVD with flexible centering

MLPCA

Maximum Likelihood PCA

Handles heteroskedastic (Poisson) noise

ORPCA

Online Robust PCA

Streaming decomposition with sparse outlier handling

RPCAGoDec

Batch Robust PCA

GoDec algorithm: fast low-rank + sparse decomposition

ORNMF

Online Robust NMF

Non-negative decomposition for streaming data

IncrementalSVD

Incremental SVD

Streaming SVD without centering

Orthomax

Orthomax Rotation

Rotation of components (Varimax when gamma=1.0)

Whitening

Whitening Transformation

Decorrelation via PCA or ZCA whitening

GPU Support#

The estimators use array_api_compat internally, which enables GPU acceleration with CuPy or PyTorch without any code changes:

import numpy as np
import cupy as cp
from hyperspy_ml_algorithms import SVDPCA

# Generate data on GPU
data_gpu = cp.asarray(np.random.random((77, 13)))

est = SVDPCA(n_components=5)
est.fit(data_gpu)               # Uses CuPy for SVD

scores_gpu = est.transform(data_gpu)
scores = cp.asnumpy(scores_gpu)  # Back to NumPy if needed

Note

Multi-backend support is not uniform across estimators. The following table lists which array backends each estimator accepts and whether the computation runs on the original device (GPU) or is performed on the CPU:

Estimator

Supported array backends

SVDPCA

NumPy, CuPy, PyTorch (GPU accelerated)

IncrementalSVD

NumPy, CuPy, PyTorch (GPU accelerated)

Whitening

NumPy, CuPy, PyTorch (GPU accelerated)

Orthomax

NumPy, CuPy, PyTorch (GPU accelerated) [1]

MLPCA

NumPy, CuPy, PyTorch (computed on CPU)

RPCAGoDec

NumPy, CuPy, PyTorch (GPU accelerated)

ORPCA

NumPy only

ORNMF

NumPy only

MLPCA accepts CuPy/PyTorch arrays but performs its SVD on the CPU, so it should be treated as a CPU estimator unless you specifically need array-namespace compatibility. ORPCA and ORNMF do not support CuPy/PyTorch inputs and should be used with NumPy arrays only.