==========
User Guide
==========

.. _installation:

Installation
============

hyperspy-ml-algorithms is available on PyPI::

    pip install hyperspy-ml-algorithms

To install with GPU support (optional)::

    pip install hyperspy-ml-algorithms[gpu]

To install with scikit-learn support (optional, enables randomized SVD)::

    pip install hyperspy-ml-algorithms[sklearn]

Quick Start
===========

All estimators follow a scikit-learn-compatible API with ``fit``,
``transform``, and ``fit_transform`` methods::

    import numpy as np
    from hyperspy_ml_algorithms import SVDPCA

    rng = np.random.RandomState(42)
    data = rng.random((77, 13))   # 77 samples, 13 features

    est = SVDPCA(n_components=5)
    est.fit(data)

    print(est.components_.shape)  # (5, 13) — rows are components
    scores = est.transform(data)
    print(scores.shape)           # (77, 5) — rows are samples

Estimator Overview
==================

The package provides 8 estimators covering a range of decomposition and
transformation techniques:

.. list-table::
   :header-rows: 1
   :widths: 15 40 45

   * - Estimator
     - Type
     - Key Feature
   * - :class:`~hyperspy_ml_algorithms.SVDPCA`
     - SVD-based PCA
     - Multi-backend SVD with flexible centering
   * - :class:`~hyperspy_ml_algorithms.MLPCA`
     - Maximum Likelihood PCA
     - Handles heteroskedastic (Poisson) noise
   * - :class:`~hyperspy_ml_algorithms.ORPCA`
     - Online Robust PCA
     - Streaming decomposition with sparse outlier handling
   * - :class:`~hyperspy_ml_algorithms.RPCAGoDec`
     - Batch Robust PCA
     - GoDec algorithm: fast low-rank + sparse decomposition
   * - :class:`~hyperspy_ml_algorithms.ORNMF`
     - Online Robust NMF
     - Non-negative decomposition for streaming data
   * - :class:`~hyperspy_ml_algorithms.IncrementalSVD`
     - Incremental SVD
     - Streaming SVD without centering
   * - :class:`~hyperspy_ml_algorithms.Orthomax`
     - Orthomax Rotation
     - Rotation of components (Varimax when ``gamma=1.0``)
   * - :class:`~hyperspy_ml_algorithms.Whitening`
     - Whitening Transformation
     - Decorrelation via PCA or ZCA whitening

GPU Support
===========

The estimators use ``array_api_compat`` internally, which enables GPU
acceleration with CuPy or PyTorch without any code changes::

    import numpy as np
    import cupy as cp
    from hyperspy_ml_algorithms import SVDPCA

    # Generate data on GPU
    data_gpu = cp.asarray(np.random.random((77, 13)))

    est = SVDPCA(n_components=5)
    est.fit(data_gpu)               # Uses CuPy for SVD

    scores_gpu = est.transform(data_gpu)
    scores = cp.asnumpy(scores_gpu)  # Back to NumPy if needed

.. note::

   Multi-backend support is not uniform across estimators. The following
   table lists which array backends each estimator accepts and whether the
   computation runs on the original device (GPU) or is performed on the CPU:

   ========================  ================================================
   Estimator                 Supported array backends
   ========================  ================================================
   SVDPCA                    NumPy, CuPy, PyTorch (GPU accelerated)
   IncrementalSVD            NumPy, CuPy, PyTorch (GPU accelerated)
   Whitening                 NumPy, CuPy, PyTorch (GPU accelerated)
   Orthomax                  NumPy, CuPy, PyTorch (GPU accelerated) [#f1]_
   MLPCA                     NumPy, CuPy, PyTorch (computed on CPU)
   RPCAGoDec                 NumPy, CuPy, PyTorch (GPU accelerated)
   ORPCA                     NumPy only
   ORNMF                     NumPy only
   ========================  ================================================

   .. [#f1] The default varimax path (:math:`0 \le \gamma \le 1`) is
      GPU-enabled; other ``gamma`` values fall back to a NumPy-only
      bivariate rotation.

   ``MLPCA`` accepts CuPy/PyTorch arrays but performs its SVD on the CPU,
   so it should be treated as a CPU estimator unless you specifically need
   array-namespace compatibility.  ``ORPCA`` and ``ORNMF`` do not support
   CuPy/PyTorch inputs and should be used with NumPy arrays only.

Estimator Gallery
=================

SVDPCA
------

SVD-based PCA with flexible centering, auto-transposition, and multi-backend
support::

    import numpy as np
    from hyperspy_ml_algorithms import SVDPCA

    rng = np.random.RandomState(42)
    data = rng.random((77, 13))

    est = SVDPCA(n_components=3, centre="features")
    scores = est.fit_transform(data)
    print(f"Components: {est.components_.shape}")  # (3, 13)
    print(f"Scores: {scores.shape}")               # (77, 3)
    print(f"Explained variance ratio: {est.explained_variance_ratio_}")

MLPCA
-----

Maximum Likelihood PCA for data with known per-element variance (e.g., Poisson
noise in electron microscopy)::

    import numpy as np
    from hyperspy_ml_algorithms import MLPCA

    rng = np.random.RandomState(42)
    data = rng.poisson(15, size=(50, 30)).astype(float)
    variance = data.copy()  # Poisson: variance = mean

    est = MLPCA(n_components=4, tol=1e-8)
    est.fit(data, variance)          # variance required!
    print(f"Scores: {est.scores_.shape}")       # (50, 4)
    print(f"Components: {est.components_.shape}")  # (4, 30), like sklearn

.. warning::

   MLPCA's ``fit()`` requires a second argument ``variance`` (not ``y=None``).
   The ``components_`` attribute follows the sklearn convention, with shape
   ``(n_components, n_features)``.

ORPCA
-----

Online Robust PCA for streaming data with sparse outlier handling::

    import numpy as np
    from hyperspy_ml_algorithms import ORPCA

    rng = np.random.RandomState(42)
    data = rng.random((200, 25))

    est = ORPCA(n_components=5, method="SGD",
                subspace_learning_rate=0.5, subspace_momentum=0.9)
    # Feed data in chunks for streaming
    n_batches = 4
    for chunk in np.array_split(data, n_batches):
        est.partial_fit(chunk)

    print(f"Low-rank: {est.low_rank_.shape}")  # (200, 25)
    print(f"Components: {est.components_.shape}")  # (5, 25)

RPCAGoDec
---------

Batch Robust PCA using bilateral random projections for fast decomposition::

    import numpy as np
    from hyperspy_ml_algorithms import RPCAGoDec

    rng = np.random.RandomState(42)
    data = rng.random((150, 30))

    est = RPCAGoDec(rank=6, tol=1e-3, max_iter=50)
    low_rank = est.fit_transform(data)   # returns low_rank_, NOT scores
    print(f"Low-rank: {low_rank.shape}")      # (150, 30)
    print(f"Sparse: {est.sparse_.shape}")     # (150, 30)
    scores = est.transform(data)              # (150, 6)

.. note::

   ``RPCAGoDec.fit_transform()`` returns the *low-rank reconstruction*, not
   scores. Call ``transform()`` separately to get score projections.

ORNMF
-----

Online Robust NMF: non-negative decomposition with sparse outlier rejection::

    import numpy as np
    from hyperspy_ml_algorithms import ORNMF

    rng = np.random.RandomState(42)
    data = np.abs(rng.random((80, 20)))  # non-negative data

    est = ORNMF(n_components=5, lambda1=0.5)
    est.fit(data)
    scores = est.transform(data)
    print(f"Components: {est.components_.shape}")  # (5, 20)
    print(f"Scores: {scores.shape}")               # (80, 5)
    print(f"Components non-negative: {(est.components_ >= 0).all()}")
    print(f"Scores non-negative: {(scores >= 0).all()}")

IncrementalSVD
--------------

Streaming SVD for out-of-core data — no centering applied::

    import numpy as np
    from hyperspy_ml_algorithms import IncrementalSVD

    rng = np.random.RandomState(42)
    data = rng.random((250, 15))

    est = IncrementalSVD(n_components=4)
    # Feed in chunks
    for chunk in np.array_split(data, 5):
        est.partial_fit(chunk)

    scores = est.transform(data)
    print(f"Components: {est.components_.shape}")         # (4, 15)
    print(f"Singular values: {est.singular_values_}")     # (4,)
    print(f"Samples seen: {est.n_samples_seen_}")         # 250
    print(f"Noise variance: {est.noise_variance_}")

Orthomax
--------

Rotation of a pre-computed component matrix. The input is ``(n_features,
n_components)`` — this is a *rotation*, not a decomposition::

    import numpy as np
    from hyperspy_ml_algorithms import SVDPCA, Orthomax

    rng = np.random.RandomState(42)
    data = rng.random((90, 18))

    # First get components from a decomposition
    pca = SVDPCA(n_components=4).fit(data)

    # Rotate components (input is n_features × n_components)
    rotator = Orthomax(gamma=1.0)  # varimax rotation
    rotated = rotator.fit_transform(pca.components_.T)
    print(f"Rotation matrix: {rotator.rotation_matrix_.shape}")  # (4, 4)
    print(f"Rotated components: {rotated.shape}")                 # (18, 4)

.. warning::

   Orthomax expects input of shape ``(n_features, n_components)``, **not**
   ``(n_samples, n_features)``. You must pass the transposed components from
   another decomposition.

Whitening
---------

Decorrelate variables via PCA or ZCA whitening::

    import numpy as np
    from hyperspy_ml_algorithms import Whitening

    rng = np.random.RandomState(42)
    data = rng.random((65, 12))

    est = Whitening(method="ZCA")
    whitened = est.fit_transform(data)
    print(f"Whitening matrix: {est.whitening_matrix_.shape}")  # (12, 12)
    print(f"Whitened data: {whitened.shape}")                  # (65, 12)

    # Verify decorrelation
    cov = np.cov(whitened.T)
    print(f"Diagonal of covariance (should be ~1): {np.diag(cov)}")