hyperspy_ml_algorithms.IncrementalSVD#

class hyperspy_ml_algorithms.IncrementalSVD(n_components=None, num_chunks=None)#

Bases: object

Incremental (streaming) SVD estimator (no centering).

Computes a plain SVD incrementally by feeding data in batches via partial_fit. After all batches have been processed, the learned components and singular values are available as attributes.

Uses the algorithm of Ross et al. (2008): each new batch is stacked with the previous top-k subspace (scaled by singular values), then a rank-k truncated SVD is computed on the stacked matrix to merge the new data into the existing subspace. No mean centering is ever applied, so the decomposition is an SVD, not PCA.

Parameters:
n_componentsint or None, default None

Number of singular components to compute. If None, defaults to min(n_samples, n_features) on the first batch.

num_chunksint or None, default None

Number of chunks to split the data into when calling fit(). If None, a heuristic is used based on the data size. Ignored when using partial_fit directly.

Attributes:
components_array, shape (n_components, n_features)

Right singular vectors (rows are components).

singular_values_array, shape (n_components,)

Singular values in descending order.

explained_variance_array, shape (n_components,)

Variance explained by each component ( / N).

explained_variance_ratio_array, shape (n_components,)

Fraction of top-k variance captured by each component ( / sum(S²)).

mean_array, shape (n_features,)

Mean vector (always zeros — no centering is applied).

n_samples_seen_int

Total number of samples processed across all partial_fit calls.

noise_variance_float

Mean of discarded singular values squared, divided by n_samples_seen_ (if any singular values were discarded).

Examples

>>> import numpy as np
>>> from hyperspy_ml_algorithms import IncrementalSVD
>>> rng = np.random.default_rng(42)
>>> X = rng.standard_normal((200, 50))
>>> est = IncrementalSVD(n_components=3)
>>> for chunk in np.array_split(X, 4):
...     est.partial_fit(chunk)
>>> components = est.components_.T       # shape (n_features, n_components)
>>> scores = est.transform(X)            # shape (n_samples, n_components)
__init__(n_components=None, num_chunks=None)#

Methods

__init__([n_components, num_chunks])

fit(X[, y])

Fit the incremental SVD model to X.

fit_transform(X[, y])

Fit the model and transform X.

partial_fit(X_chunk[, y])

Fit one batch without centering (plain incremental SVD).

transform(X)

Project X onto the learned components.

Attributes

mean_

Mean vector (always zeros — no centering is applied).