BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
abuza
Calcite | Level 5

Hi,

 

I've been working with a big and sparse matrix (over 5,000 columns -dummy variables- and 2M observations). For every observation, only 20-30 columns of the 5,000 contain a value. All of the 5,000 columns are independent variables, except for one, which is the dependent I need to explain. Running proc Logistic on that matrix, besides that technically results in an 'out of memory', is not a solutions as I would have to run it on a regularly basis.

 

Using IML, with sparse, dense and full functions I can get rid of the zeroes and store only the significant values. Hence, my question arises as to how can I run proc logistic / glimmix on that resulting matrix, as it is totally distorted from the original one. And further, how could I be able to extract insights from the results of these procedures relating to the original matrix an its variables.

 

Thank you for your time.

Kind Regards,

Alexis.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

I suggest you try PROC HPLOGISTIC, I would think that a CLASS variable with 5000 levels should be solvable although I haven't actually done it myself.  Make sure that you use the CLASS statement and do not explicitly form the 5000-column dummy matrix, otherwise SAS won't be able to exploit the sparsity structure.

 

Also, if you are only interested in predictions, you can turn off the computations of standard errors and confidence intervals, which saves A LOT of time! Use PROC HPLOGISTIC NOSTDERR;

View solution in original post

2 REPLIES 2
Rick_SAS
SAS Super FREQ

I suggest you try PROC HPLOGISTIC, I would think that a CLASS variable with 5000 levels should be solvable although I haven't actually done it myself.  Make sure that you use the CLASS statement and do not explicitly form the 5000-column dummy matrix, otherwise SAS won't be able to exploit the sparsity structure.

 

Also, if you are only interested in predictions, you can turn off the computations of standard errors and confidence intervals, which saves A LOT of time! Use PROC HPLOGISTIC NOSTDERR;

abuza
Calcite | Level 5
Thank you Rick, I will give it a try.

Kind Regards,
Alexis

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 2 replies
  • 885 views
  • 2 likes
  • 2 in conversation