Hi,
I've been working with a big and sparse matrix (over 5,000 columns -dummy variables- and 2M observations). For every observation, only 20-30 columns of the 5,000 contain a value. All of the 5,000 columns are independent variables, except for one, which is the dependent I need to explain. Running proc Logistic on that matrix, besides that technically results in an 'out of memory', is not a solutions as I would have to run it on a regularly basis.
Using IML, with sparse, dense and full functions I can get rid of the zeroes and store only the significant values. Hence, my question arises as to how can I run proc logistic / glimmix on that resulting matrix, as it is totally distorted from the original one. And further, how could I be able to extract insights from the results of these procedures relating to the original matrix an its variables.
Thank you for your time.
Kind Regards,
Alexis.
I suggest you try PROC HPLOGISTIC, I would think that a CLASS variable with 5000 levels should be solvable although I haven't actually done it myself. Make sure that you use the CLASS statement and do not explicitly form the 5000-column dummy matrix, otherwise SAS won't be able to exploit the sparsity structure.
Also, if you are only interested in predictions, you can turn off the computations of standard errors and confidence intervals, which saves A LOT of time! Use PROC HPLOGISTIC NOSTDERR;
I suggest you try PROC HPLOGISTIC, I would think that a CLASS variable with 5000 levels should be solvable although I haven't actually done it myself. Make sure that you use the CLASS statement and do not explicitly form the 5000-column dummy matrix, otherwise SAS won't be able to exploit the sparsity structure.
Also, if you are only interested in predictions, you can turn off the computations of standard errors and confidence intervals, which saves A LOT of time! Use PROC HPLOGISTIC NOSTDERR;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.