BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

Hi,

 

I've been tasked (voluntold) to convert some Stata code to SAS. It's using NPREGRESS which appears to be a non-parameteric regression. 

 

I'm looking at using GAMPL procedure in SAS to see if that's the equivalent. If anyone has any thoughts on which proc may be appropriate that would be highly appreciated. Just need pointers to the procs I should be looking at. 

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

Personally, I'd recommend ADAPTIVEREG for a problem that has several hundred parameters. The GAMPL procedure solves a big optimization problem and is mostly focused on 1-D transformations of each individual variable. The ADAPTIVEREG procedure should obtain a solution faster while still being flexible in fitting the data.

 

Kernel regression is old technology. I do not recommend it for high-dimensional problems.

View solution in original post

7 REPLIES 7
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

I'm filing "voluntold" away for future use.

 

Meanwhile, I personally have no clue. But @Rick_SAS has this pertinent blog post that might be useful. And he is wonderfully wise.

 

I hope this helps!

 

Rick_SAS
SAS Super FREQ

SAS has many nonparametric and semi-parametric procedures. Please tell us

1. How many explanatory variables you have and whether any are classification variables

2. What is the nature of the response variable? Continuous? Counts? Binary?

 

GAMPL is one choice, as is the ADAPTIVEREG procedure (click here for a discussion and example of 2-d regression of binary response).  For 1-D or 2-D data and a continuous response, I like the nonparametric smoothers in the LOESS.procedure.

 

Reeza
Super User

Thanks Rick. I've been playing around with GAMPL and was planning to look into ADAPTIVEREG as well as LOESS, though if it can only handle 2D data that won't work.


The response is continuous and for explanatory variables I was hoping for 10 to 300 (kitchen sink). I do have about 3 million observations. 

 

It looks like npregress is a kernel regression and from your blog post SAS doesn't do kernel regression for multivariate data, at least out of the box. 

 

 

Rick_SAS
SAS Super FREQ

Personally, I'd recommend ADAPTIVEREG for a problem that has several hundred parameters. The GAMPL procedure solves a big optimization problem and is mostly focused on 1-D transformations of each individual variable. The ADAPTIVEREG procedure should obtain a solution faster while still being flexible in fitting the data.

 

Kernel regression is old technology. I do not recommend it for high-dimensional problems.

Reeza
Super User
Awesome, thanks Rick! One last quick question, with 3 million rows, my data set is 13GB (too big) and I have a computer with 16GB of RAM, will that be enough to run this? It took 45 minutes with one variable. Guess I'll try and see if it explodes 🙂
Rick_SAS
SAS Super FREQ

I have no idea. I do not have experience with running ADAPTIVEREG on large data.  However, I would recommend that you start with a subset of the data (maybe 25-50k obs) and time how long it takes. Then double the number of obs to see how the performance scales. You can also compare the predictions for the smaller and larger samples. If they are similar, then there is no need to use the larger sample size. IMHO, the predictions you get with a smaller sample will probably be very similar to the predictions you get with 3M obs.

Reeza
Super User
Thanks, I'll keep working on it and let you know how it goes.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1422 views
  • 9 likes
  • 3 in conversation