BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
wkm21
Calcite | Level 5

Hi

 

I am trying to run an forward regression code but I think I am running into an error with formatting of my variables. The variables in the dataset are:

 

input Line$ WT Count X1$ X2$ X3$ X4$ X5$ ;
datalines;
Line 1 54 7 NA 2 2 2 0
Line 2 25 5 0 0 0 NA 1
Line 3 27 4 0 0 0 0 0
Line 4 40 5 0 0 1 0 0
Line 5 43 6 0 0 2 2 0
Line 6 27 5 0 0 0 0 0
Line 7 34 6 0 0 2 0 2
Line 8 32 6 0 0 2 1 2
Line 9 42 5 0 0 2 2 0
Line 10 35 5 0 0 0 0 0

So The X### variables are just Marker values (so it should be character or as a factor--- 0=XX , 1= YX, 2=YY), and there are numeric measurements of weight and count.  My missing in this data is coded as NA so I formatted them in SAS as: 

 

data Yr2;
set Yr1;
if X1= 'NA' then X1= ' ';
if X2 = 'NA' then X2 = ' ';
if X3 = 'NA' then X3 = ' ';
if X4 = 'NA' then X4= ' ';
if X5 = 'NA' then X5 = ' ';
run; 
	
PROC PRINT data=Yr2;  run;	

Then I try to run my regression code:

Title color=red "Forward alpha=0.10";
proc reg data=Yr2;
model Wt Count =X1$ X2$ X3$ X4$ X5$/selection=forward SLENTRY=0.10;

run;

I get this error:

 

 ERROR 22-322: Syntax error, expecting one of the following: a name, ;, -, /, :, _ALL_, _CHARACTER_, _CHAR_, _NUMERIC_, {.  
 ERROR 76-322: Syntax error, statement will be ignored.

I think I am doing something wrong in formatting of my variables but I don't know how to fix my code.  Because here I am comparing two response variables to Wt and Count the predictors (markers). I want to know significant marker effects in this model as well as Rsq.

 

Any advice would help.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

Hi, I wanted to see 100% of the LOG of PROC GLM, I stated that multiple times. I don't need to see the LOG of the rest of your code, where there are no errors.

 

There is no SELECTION= in PROC GLM.

 

If you really have to use SELECTION=, you can do this in PROC REG using numeric variables only; and then the value of 2 will be interpreted as twice the value of 1, as a linear regression will be fit to your x1-x5. Also, missing values will not be used in the regression model. Maybe you want dummy variables for all the levels of your independent variables, so 2 is indicated by its own column and 1 is indicated by its own columns, and NA is indicated by its own column.

 

Or you can use PROC GLMSELECT, which does allow SELECTION= with CLASS variables. That's probably easier.

 

Further, if this is your entire data set (as opposed to a small example for illustration purposes), I would not recommend any kind of stepwise regression on 10 observations.

--
Paige Miller

View solution in original post

5 REPLIES 5
PaigeMiller
Diamond | Level 26

PROC REG can only accept numeric variables. It will also ignore rows that have missing values. This code gives you numeric variables. Then, you can also get rid of the $ in PROC REG.

 

data have;
input Line$ 1-6 WT Count X1 X2 X3 X4 X5 ;
datalines;
Line 1 54 7 . 2 2 2 0
Line 2 25 5 0 0 0 . 1
Line 3 27 4 0 0 0 0 0
Line 4 40 5 0 0 1 0 0
Line 5 43 6 0 0 2 2 0
Line 6 27 5 0 0 0 0 0
Line 7 34 6 0 0 2 0 2
Line 8 32 6 0 0 2 1 2
Line 9 42 5 0 0 2 2 0
Line 10 35 5 0 0 0 0 0
;

 

HOWEVER: A BETTER APPROACH

 

Use PROC GLM with a CLASS statement, then you can use character variables. One of the problems of using PROC REG is that 2 is considered twice the value of 1, while in PROC GLM with a CLASS statement, then you have 1 and 2 are distinct levels, but 2 is not twice the value of 1, and when a variable is equal to 'NA' is not removed from the analysis.

 

proc glm;
    class x1 - x5;
    model wt count= x1-x5;
run;
quit;

So, leave the variables as character and use PROC GLM.

--
Paige Miller
wkm21
Calcite | Level 5

Hi Paige
So I did it like this:


proc glm;
class x1 x2 x3 x4 x5;
model wt count=x1 x2 x3 x4 x5/selection=forward SLENTRY=0.10;;
run;




But I am getting this:

```

ERROR 22-322: Syntax error, expecting one of the following: ;, ALIASING, ALPHA, CLI, CLM, CLPARM, COVBYCLASS, E, E1, E2, E3, E4,
EST, I, INTERCEPT, INVERSE, NOINT, NOUNI, P, PREDICTED, SINGULAR, SOLUTION, SS1, SS2, SS3, SS4, TOLERANCE, X, XPX,
ZETA.
ERROR 76-322: Syntax error, statement will be ignored.

```

Also what is "y1-y5" in your response?

PaigeMiller
Diamond | Level 26

Please do not show us errors in the LOG detached from the code. Do not choose parts of the LOG of PROC GLM to show us, and not show us other parts of the LOG of PROC GLM.

 

Please do show us the entire LOG for PROC GLM, all of it, 100% with nothing chopped out.

 

y1-y5 was a typographical error, and has been corrected.

--
Paige Miller
wkm21
Calcite | Level 5

Sorry about that here is the log:

 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 72         
 73         data Yr1;
 74         input LINE$ WT COUNT X1$ X2$ X3$ X4$ X5$ ;
 75         datalines;
 
 NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
 NOTE: The data set WORK.YR1 has 10 observations and 8 variables.
 NOTE: DATA statement used (Total process time):
       real time           0.00 seconds
       user cpu time       0.00 seconds
       system cpu time     0.00 seconds
       memory              784.56k
       OS Memory           38056.00k
       Timestamp           04/24/2021 09:28:42 PM
       Step Count                        366  Switch Count  2
       Page Faults                       0
       Page Reclaims                     90
       Page Swaps                        0
       Voluntary Context Switches        13
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           264
       
 
 87         ;
 88         run;
 89         
 90         PROC PRINT data=Yr1;  run;
 
 NOTE: There were 10 observations read from the data set WORK.YR1.
 NOTE: PROCEDURE PRINT used (Total process time):
       real time           0.04 seconds
       user cpu time       0.04 seconds
       system cpu time     0.01 seconds
       memory              2434.81k
       OS Memory           38056.00k
       Timestamp           04/24/2021 09:28:42 PM
       Step Count                        367  Switch Count  0
       Page Faults                       0
       Page Reclaims                     63
       Page Swaps                        0
       Voluntary Context Switches        0
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           8
       
 
 91         
 92         data Yr2;
 93         set Yr1;
 94         if X1 = 'NA' then X1 = ' ';
 95         if X2 = 'NA' then X2 = ' ';
 96         if X3 = 'NA' then X3 = ' ';
 97         if X4 = 'NA' then X4 = ' ';
 98         if X5 = 'NA' then X5 = ' ';
 99         run;
 
 NOTE: There were 10 observations read from the data set WORK.YR1.
 NOTE: The data set WORK.YR2 has 10 observations and 8 variables.
 NOTE: DATA statement used (Total process time):
       real time           0.00 seconds
       user cpu time       0.00 seconds
       system cpu time     0.00 seconds
       memory              1064.18k
       OS Memory           38316.00k
       Timestamp           04/24/2021 09:28:42 PM
       Step Count                        368  Switch Count  2
       Page Faults                       0
       Page Reclaims                     125
       Page Swaps                        0
       Voluntary Context Switches        17
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           264
       
 
 100        
 101        PROC PRINT data=Yr2;  run;
 
 NOTE: There were 10 observations read from the data set WORK.YR2.
 NOTE: PROCEDURE PRINT used (Total process time):
       real time           0.02 seconds
       user cpu time       0.03 seconds
       system cpu time     0.00 seconds
       memory              978.15k
       OS Memory           38056.00k
       Timestamp           04/24/2021 09:28:42 PM
       Step Count                        369  Switch Count  0
       Page Faults                       0
       Page Reclaims                     60
       Page Swaps                        0
       Voluntary Context Switches        0
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           8
       
 
 101      !                           
 102        
 103        
 104        Title color=red "Forward alpha=0.10";
 105        proc glm;
 106        class x1 x2 x3 x4 x5;
 107        model wt count=x1 x2 x3 x4 x5/selection=forward SLENTRY=0.10;;
                                          _________
                                          22
                                          76
 NOTE: The previous statement has been deleted.
 ERROR 22-322: Syntax error, expecting one of the following: ;, ALIASING, ALPHA, CLI, CLM, CLPARM, COVBYCLASS, E, E1, E2, E3, E4, 
               EST, I, INTERCEPT, INVERSE, NOINT, NOUNI, P, PREDICTED, SINGULAR, SOLUTION, SS1, SS2, SS3, SS4, TOLERANCE, X, XPX, 
               ZETA.  
 ERROR 76-322: Syntax error, statement will be ignored.
 108        run;
 
 109        
 110        OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 122        

FYI I want to do variable selection of the model using forward and backward reg.

PaigeMiller
Diamond | Level 26

Hi, I wanted to see 100% of the LOG of PROC GLM, I stated that multiple times. I don't need to see the LOG of the rest of your code, where there are no errors.

 

There is no SELECTION= in PROC GLM.

 

If you really have to use SELECTION=, you can do this in PROC REG using numeric variables only; and then the value of 2 will be interpreted as twice the value of 1, as a linear regression will be fit to your x1-x5. Also, missing values will not be used in the regression model. Maybe you want dummy variables for all the levels of your independent variables, so 2 is indicated by its own column and 1 is indicated by its own columns, and NA is indicated by its own column.

 

Or you can use PROC GLMSELECT, which does allow SELECTION= with CLASS variables. That's probably easier.

 

Further, if this is your entire data set (as opposed to a small example for illustration purposes), I would not recommend any kind of stepwise regression on 10 observations.

--
Paige Miller

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 490 views
  • 0 likes
  • 2 in conversation