Calcite | Level 5

## Running Proc Logistic on a sparse matrix

Hi,

I've been working with a big and sparse matrix (over 5,000 columns -dummy variables- and 2M observations). For every observation, only 20-30 columns of the 5,000 contain a value. All of the 5,000 columns are independent variables, except for one, which is the dependent I need to explain. Running proc Logistic on that matrix, besides that technically results in an 'out of memory', is not a solutions as I would have to run it on a regularly basis.

Using IML, with sparse, dense and full functions I can get rid of the zeroes and store only the significant values. Hence, my question arises as to how can I run proc logistic / glimmix on that resulting matrix, as it is totally distorted from the original one. And further, how could I be able to extract insights from the results of these procedures relating to the original matrix an its variables.

Kind Regards,

Alexis.

1 ACCEPTED SOLUTION

Accepted Solutions
SAS Super FREQ

## Re: Running Proc Logistic on a sparse matrix

I suggest you try PROC HPLOGISTIC, I would think that a CLASS variable with 5000 levels should be solvable although I haven't actually done it myself.  Make sure that you use the CLASS statement and do not explicitly form the 5000-column dummy matrix, otherwise SAS won't be able to exploit the sparsity structure.

Also, if you are only interested in predictions, you can turn off the computations of standard errors and confidence intervals, which saves A LOT of time! Use PROC HPLOGISTIC NOSTDERR;

2 REPLIES 2
SAS Super FREQ

## Re: Running Proc Logistic on a sparse matrix

I suggest you try PROC HPLOGISTIC, I would think that a CLASS variable with 5000 levels should be solvable although I haven't actually done it myself.  Make sure that you use the CLASS statement and do not explicitly form the 5000-column dummy matrix, otherwise SAS won't be able to exploit the sparsity structure.

Also, if you are only interested in predictions, you can turn off the computations of standard errors and confidence intervals, which saves A LOT of time! Use PROC HPLOGISTIC NOSTDERR;

Calcite | Level 5

## Re: Running Proc Logistic on a sparse matrix

Thank you Rick, I will give it a try.

Kind Regards,
Alexis
From The DO Loop