BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
abuza
Calcite | Level 5

Hi,

 

I've been working with a big and sparse matrix (over 5,000 columns -dummy variables- and 2M observations). For every observation, only 20-30 columns of the 5,000 contain a value. All of the 5,000 columns are independent variables, except for one, which is the dependent I need to explain. Running proc Logistic on that matrix, besides that technically results in an 'out of memory', is not a solutions as I would have to run it on a regularly basis.

 

Using IML, with sparse, dense and full functions I can get rid of the zeroes and store only the significant values. Hence, my question arises as to how can I run proc logistic / glimmix on that resulting matrix, as it is totally distorted from the original one. And further, how could I be able to extract insights from the results of these procedures relating to the original matrix an its variables.

 

Thank you for your time.

Kind Regards,

Alexis.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

I suggest you try PROC HPLOGISTIC, I would think that a CLASS variable with 5000 levels should be solvable although I haven't actually done it myself.  Make sure that you use the CLASS statement and do not explicitly form the 5000-column dummy matrix, otherwise SAS won't be able to exploit the sparsity structure.

 

Also, if you are only interested in predictions, you can turn off the computations of standard errors and confidence intervals, which saves A LOT of time! Use PROC HPLOGISTIC NOSTDERR;

View solution in original post

2 REPLIES 2
Rick_SAS
SAS Super FREQ

I suggest you try PROC HPLOGISTIC, I would think that a CLASS variable with 5000 levels should be solvable although I haven't actually done it myself.  Make sure that you use the CLASS statement and do not explicitly form the 5000-column dummy matrix, otherwise SAS won't be able to exploit the sparsity structure.

 

Also, if you are only interested in predictions, you can turn off the computations of standard errors and confidence intervals, which saves A LOT of time! Use PROC HPLOGISTIC NOSTDERR;

abuza
Calcite | Level 5
Thank you Rick, I will give it a try.

Kind Regards,
Alexis

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 2 replies
  • 937 views
  • 2 likes
  • 2 in conversation