Hi,
I've been tasked (voluntold) to convert some Stata code to SAS. It's using NPREGRESS which appears to be a non-parameteric regression.
I'm looking at using GAMPL procedure in SAS to see if that's the equivalent. If anyone has any thoughts on which proc may be appropriate that would be highly appreciated. Just need pointers to the procs I should be looking at.
Thanks!
Personally, I'd recommend ADAPTIVEREG for a problem that has several hundred parameters. The GAMPL procedure solves a big optimization problem and is mostly focused on 1-D transformations of each individual variable. The ADAPTIVEREG procedure should obtain a solution faster while still being flexible in fitting the data.
Kernel regression is old technology. I do not recommend it for high-dimensional problems.
I'm filing "voluntold" away for future use.
Meanwhile, I personally have no clue. But @Rick_SAS has this pertinent blog post that might be useful. And he is wonderfully wise.
I hope this helps!
SAS has many nonparametric and semi-parametric procedures. Please tell us
1. How many explanatory variables you have and whether any are classification variables
2. What is the nature of the response variable? Continuous? Counts? Binary?
GAMPL is one choice, as is the ADAPTIVEREG procedure (click here for a discussion and example of 2-d regression of binary response). For 1-D or 2-D data and a continuous response, I like the nonparametric smoothers in the LOESS.procedure.
Thanks Rick. I've been playing around with GAMPL and was planning to look into ADAPTIVEREG as well as LOESS, though if it can only handle 2D data that won't work.
The response is continuous and for explanatory variables I was hoping for 10 to 300 (kitchen sink). I do have about 3 million observations.
It looks like npregress is a kernel regression and from your blog post SAS doesn't do kernel regression for multivariate data, at least out of the box.
Personally, I'd recommend ADAPTIVEREG for a problem that has several hundred parameters. The GAMPL procedure solves a big optimization problem and is mostly focused on 1-D transformations of each individual variable. The ADAPTIVEREG procedure should obtain a solution faster while still being flexible in fitting the data.
Kernel regression is old technology. I do not recommend it for high-dimensional problems.
I have no idea. I do not have experience with running ADAPTIVEREG on large data. However, I would recommend that you start with a subset of the data (maybe 25-50k obs) and time how long it takes. Then double the number of obs to see how the performance scales. You can also compare the predictions for the smaller and larger samples. If they are similar, then there is no need to use the larger sample size. IMHO, the predictions you get with a smaller sample will probably be very similar to the predictions you get with 3M obs.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.