MATLAB is running out of memory but it should not be

For a data matrix of size n-by-p, PRINCOMP will return a coefficient matrix of size p-by-p where each column is a principal component expressed using the original dimensions, so in your case you will create an output matrix of size:

1036800*1036800*8 bytes ~ 7.8 TB

Consider using PRINCOMP(X,'econ') to return only the PCs with significant variance

Alternatively, consider performing PCA by SVD: in your case n<<p, and the covariance matrix is impossible to compute. Therefore, instead of decomposing the p-by-p matrix XX', it is sufficient to only decompose the smaller n-by-n matrix X'X. Refer to this paper for reference.


EDIT:

Here’s my implementation, the outputs of this function match those of PRINCOMP (the first three anyway):

function [PC,Y,varPC] = pca_by_svd(X)
    % PCA_BY_SVD
    %   X      data matrix of size n-by-p where n<<p
    %   PC     columns are first n principal components
    %   Y      data projected on those PCs
    %   varPC  variance along the PCs
    %

    X0 = bsxfun(@minus, X, mean(X,1));     % shift data to zero-mean
    [U,S,PC] = svd(X0,'econ');             % SVD decomposition
    Y = X0*PC;                             % project X on PC
    varPC = diag(S'*S)' / (size(X,1)-1);   % variance explained
end

I just tried it on my 4GB machine, and it ran just fine:

» x = rand(16,1036800);
» [PC, Y, varPC] = pca_by_svd(x);
» whos
  Name             Size                     Bytes  Class     Attributes

  PC         1036800x16                 132710400  double              
  Y               16x16                      2048  double              
  varPC            1x16                       128  double              
  x               16x1036800            132710400  double              

Update:

The princomp function became deprecated in favor of pca introduced in R2012b, which includes many more options.

Leave a Comment