BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
bspan
Calcite | Level 5

Hey all,

 

New SAS user here - attempting to do some linear regression on a large data set. 

 

I understand the basics of running PROC GLM

 

proc glm data=dset plots=all; 
class b c; 
model y = a--d / solution;
run;

My problem is that the number of class variables is far too big to list out manually. The dataset contains about 100 variables and its split between continuous and discrete. Is there a way that SAS can determine which variable should belong in the CLASS statement on its own? 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Yes, if your class variables are of type character, you can define a list like

 

class v1-character-v100;

 

it will designate all character variables between v1 and v100, inclusive. 

PG

View solution in original post

3 REPLIES 3
PGStats
Opal | Level 21

Yes, if your class variables are of type character, you can define a list like

 

class v1-character-v100;

 

it will designate all character variables between v1 and v100, inclusive. 

PG
PaigeMiller
Diamond | Level 26

@bspan wrote:

Hey all,

 

New SAS user here - attempting to do some linear regression on a large data set. 

 

I understand the basics of running PROC GLM

 

proc glm data=dset plots=all; 
class b c; 
model y = a--d / solution;
run;

My problem is that the number of class variables is far too big to list out manually. The dataset contains about 100 variables and its split between continuous and discrete. Is there a way that SAS can determine which variable should belong in the CLASS statement on its own? 


As stated by @PGStats, you can indeed do this easily. However, I would advise against it as being a poor practice. If you have a lot of class variable levels in total, GLM will grind to a halt and take a very long time to compute the results. Furthermore, such results most likely will not be meaningful or useful as most of your 100 variables will be correlated with each other, causing additional estimation problems and interpretation problems.

 

A better approach would be to use PROC PLS on this data set, PLS handles the multiple correlated input variables in a superior fashion than GLM will, and will not take as long as PROC GLM to compute all of these estimates.

--
Paige Miller
bspan
Calcite | Level 5

I accepted @PGStats answer as the solution, but I will explore PROC PLS. Thanks for the insight!

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1382 views
  • 0 likes
  • 3 in conversation