hyperspy_ml_algorithms.ORPCA#
- class hyperspy_ml_algorithms.ORPCA(n_components, store_error=False, lambda1=0.1, lambda2=1.0, method='BCD', init='qr', training_samples=10, subspace_learning_rate=1.0, subspace_momentum=0.5, random_state=None)#
Bases:
objectOnline Robust Principal Component Analysis.
Decomposes a data matrix into low-rank and sparse components using online stochastic optimisation. The model is updated incrementally as new samples arrive, making it suitable for streaming data and datasets that do not fit in memory.
The algorithm is based on [Feng2013] with extensions for stochastic gradient descent (SGD) and momentum-based optimisation [Ruder2016].
- Parameters:
- n_componentsint
Number of components (rank of the low-rank subspace).
- store_errorbool, default False
If True, the sparse error matrix is stored and accessible via the
sparse_attribute after fitting.- lambda1float, default 0.1
Nuclear-norm regularisation parameter.
- lambda2float, default 1.0
Sparse-error regularisation parameter.
- method{‘CF’, ‘BCD’, ‘SGD’, ‘MomentumSGD’}, default ‘BCD’
Solver used for the subspace update step:
'CF'— Closed-form solution.'BCD'— Block-coordinate descent (default).'SGD'— Stochastic gradient descent.'MomentumSGD'— SGD with momentum.
- init{‘qr’, ‘rand’} or ndarray, default ‘qr’
Subspace initialisation method:
'qr'— QR decomposition of the first training_samples.'rand'— Random initialisation.ndarray of shape
(n_features, n_components).
- training_samplesint, default 10
Number of samples used for
'qr'initialisation. Must be >=n_components.- subspace_learning_ratefloat, default 1.0
Learning rate for the
'SGD'and'MomentumSGD'methods. Must be > 0.- subspace_momentumfloat, default 0.5
Momentum coefficient for
'MomentumSGD'(between 0 and 1).- random_stateNone, int, or numpy.random.RandomState, default None
Random seed or RandomState for reproducible results.
- Attributes:
components_ndarray of shape (n_components, n_features)Learned subspace, shape
(n_components, n_features)— sklearn convention (rows are components).low_rank_ndarray of shape (n_samples, n_features)Low-rank reconstruction of the fitted data, shape
(n_samples, n_features).sparse_ndarray of shape (n_samples, n_features) or NoneSparse error matrix, shape
(n_samples, n_features).
Notes
The estimator is inherently online: call
partial_fitrepeatedly with new chunks of data to update the model incrementally without revisiting past samples.References
[Feng2013]Jiashi Feng, Huan Xu and Shuicheng Yuan, “Online Robust PCA via Stochastic Optimization”, Advances in Neural Information Processing Systems 26, (2013), pp. 404–412.
[Ruder2016]Sebastian Ruder, “An overview of gradient descent optimization algorithms”, arXiv:1609.04747, (2016).
- __init__(n_components, store_error=False, lambda1=0.1, lambda2=1.0, method='BCD', init='qr', training_samples=10, subspace_learning_rate=1.0, subspace_momentum=0.5, random_state=None)#
Methods
__init__(n_components[, store_error, ...])fit(X[, y])Fit the online RPCA model to X.
fit_transform(X[, y])Fit the model and return the scores for X.
partial_fit(X[, batch_size])Process one batch of data, updating the model incrementally.
transform(X)Project X onto the learned components.
Attributes
components_Learned subspace, shape
(n_components, n_features)— sklearn convention (rows are components).low_rank_Low-rank reconstruction of the fitted data, shape
(n_samples, n_features).sparse_Sparse error matrix, shape
(n_samples, n_features).