hyperspy_ml_algorithms.MLPCA#

class hyperspy_ml_algorithms.MLPCA(n_components=None, max_iter=50000, tol=1e-10)#

Bases: object

Maximum Likelihood Principal Component Analysis.

Standard PCA based on a singular value decomposition (SVD) approach assumes that the data is corrupted with Gaussian, or homoskedastic noise. For many applications, this assumption does not hold. For example, count data from EDS-TEM experiments is corrupted by Poisson noise, where the noise variance depends on the underlying pixel value. Rather than scaling or transforming the data to approximately “normalize” the noise, MLPCA instead uses estimates of the data variance to perform the decomposition.

This implementation is a transcription of MATLAB code from [Andrews1997].

Parameters:
n_componentsint or None, default=None

Number of components to keep. If None, all components are kept.

max_iterint, default=50000

Maximum number of iterations before exiting without convergence.

tolfloat, default=1e-10

Tolerance of the stopping condition.

Attributes:
components_ndarray of shape (n_components, n_features)

Principal axes in feature space, representing the directions of maximum variance. Equivalent to the right singular vectors (V).

singular_values_ndarray of shape (n_components,)

Singular values corresponding to each component.

scores_ndarray of shape (n_samples, n_components)

Projection of the data onto the components (U * S).

mean_None

MLPCA does not center the data; always None.

References

[Andrews1997]

Darren T. Andrews and Peter D. Wentzell, “Applications of maximum likelihood principal component analysis: incomplete data sets and calibration transfer”, Analytica Chimica Acta 350, no. 3 (September 19, 1997): 341-352.

Examples

>>> import numpy as np
>>> from hyperspy_ml_algorithms import MLPCA
>>> rng = np.random.RandomState(42)
>>> X = rng.poisson(10, size=(50, 30)).astype(float)
>>> variance = X.copy()  # Poisson noise: variance = mean
>>> est = MLPCA(n_components=3)
>>> est.fit(X, variance)
>>> scores = est.transform(X)
>>> scores.shape
(50, 3)
__init__(n_components=None, max_iter=50000, tol=1e-10)#

Methods

__init__([n_components, max_iter, tol])

fit(X, variance)

Fit the MLPCA model to the data.

fit_transform(X, variance)

Fit the model and return the scores.

transform(X)

Project X onto the learned components.