hyperspy_ml_algorithms.ORPCA#

class hyperspy_ml_algorithms.ORPCA(n_components, store_error=False, lambda1=0.1, lambda2=1.0, method='BCD', init='qr', training_samples=10, subspace_learning_rate=1.0, subspace_momentum=0.5, random_state=None)#

Bases: object

Online Robust Principal Component Analysis.

Decomposes a data matrix into low-rank and sparse components using online stochastic optimisation. The model is updated incrementally as new samples arrive, making it suitable for streaming data and datasets that do not fit in memory.

The algorithm is based on [Feng2013] with extensions for stochastic gradient descent (SGD) and momentum-based optimisation [Ruder2016].

Parameters:
n_componentsint

Number of components (rank of the low-rank subspace).

store_errorbool, default False

If True, the sparse error matrix is stored and accessible via the sparse_ attribute after fitting.

lambda1float, default 0.1

Nuclear-norm regularisation parameter.

lambda2float, default 1.0

Sparse-error regularisation parameter.

method{‘CF’, ‘BCD’, ‘SGD’, ‘MomentumSGD’}, default ‘BCD’

Solver used for the subspace update step:

  • 'CF' — Closed-form solution.

  • 'BCD' — Block-coordinate descent (default).

  • 'SGD' — Stochastic gradient descent.

  • 'MomentumSGD' — SGD with momentum.

init{‘qr’, ‘rand’} or ndarray, default ‘qr’

Subspace initialisation method:

  • 'qr' — QR decomposition of the first training_samples.

  • 'rand' — Random initialisation.

  • ndarray of shape (n_features, n_components).

training_samplesint, default 10

Number of samples used for 'qr' initialisation. Must be >= n_components.

subspace_learning_ratefloat, default 1.0

Learning rate for the 'SGD' and 'MomentumSGD' methods. Must be > 0.

subspace_momentumfloat, default 0.5

Momentum coefficient for 'MomentumSGD' (between 0 and 1).

random_stateNone, int, or numpy.random.RandomState, default None

Random seed or RandomState for reproducible results.

Attributes:
components_ndarray of shape (n_components, n_features)

Learned subspace, shape (n_components, n_features) — sklearn convention (rows are components).

low_rank_ndarray of shape (n_samples, n_features)

Low-rank reconstruction of the fitted data, shape (n_samples, n_features).

sparse_ndarray of shape (n_samples, n_features) or None

Sparse error matrix, shape (n_samples, n_features).

Notes

The estimator is inherently online: call partial_fit repeatedly with new chunks of data to update the model incrementally without revisiting past samples.

References

[Feng2013]

Jiashi Feng, Huan Xu and Shuicheng Yuan, “Online Robust PCA via Stochastic Optimization”, Advances in Neural Information Processing Systems 26, (2013), pp. 404–412.

[Ruder2016]

Sebastian Ruder, “An overview of gradient descent optimization algorithms”, arXiv:1609.04747, (2016).

__init__(n_components, store_error=False, lambda1=0.1, lambda2=1.0, method='BCD', init='qr', training_samples=10, subspace_learning_rate=1.0, subspace_momentum=0.5, random_state=None)#

Methods

__init__(n_components[, store_error, ...])

fit(X[, y])

Fit the online RPCA model to X.

fit_transform(X[, y])

Fit the model and return the scores for X.

partial_fit(X[, batch_size])

Process one batch of data, updating the model incrementally.

transform(X)

Project X onto the learned components.

Attributes

components_

Learned subspace, shape (n_components, n_features) — sklearn convention (rows are components).

low_rank_

Low-rank reconstruction of the fitted data, shape (n_samples, n_features).

sparse_

Sparse error matrix, shape (n_samples, n_features).