An Idea Exchange for SAS software and services

by PROC Star
on ‎05-13-2012 10:27 PM

Can I ask for a new ballot item? In case I can:

PROC GPLOT: provide an option on the PLOT2 curve to be displayed *behind* the PLOT curve rather than in front.

by Occasional Contributor MarcelHaas
on ‎11-27-2014 04:51 AM

Variables, not observations, right? More than 500 variables in "most" cases so far? I thought 25 was already quite a few...

by Super Contributor
on ‎12-01-2014 01:59 PM

I know right!? When I read 500+ variables I thought there must be a lot of folks doing either image recognition or using high performance data mining.

If I had to choose large vs wide data sets, I would choose large. I prefer to have several thousand observations and few variables (probably less than 100) than having a large number of variables for not so many observations.

I doubt that many people have 500+ variables in their final model. A common practice is to do two parallel flows, one with variable selection and one without. More often than not I end up choosing the one with variable selection because even if the fit statistcs for both subflows are around the same same ballpark, less variables are easier to handle in a production environment.

So, I am with you... but who knows what those data miner scientists/data-jedi people are doing these days! I hope they post about their 500+ variables models, that might be really interesting.

Idea Statuses
Top Liked Authors