Hi
I am trying to run an forward regression code but I think I am running into an error with formatting of my variables. The variables in the dataset are:
input Line$ WT Count X1$ X2$ X3$ X4$ X5$ ; datalines; Line 1 54 7 NA 2 2 2 0 Line 2 25 5 0 0 0 NA 1 Line 3 27 4 0 0 0 0 0 Line 4 40 5 0 0 1 0 0 Line 5 43 6 0 0 2 2 0 Line 6 27 5 0 0 0 0 0 Line 7 34 6 0 0 2 0 2 Line 8 32 6 0 0 2 1 2 Line 9 42 5 0 0 2 2 0 Line 10 35 5 0 0 0 0 0So The X### variables are just Marker values (so it should be character or as a factor--- 0=XX , 1= YX, 2=YY), and there are numeric measurements of weight and count. My missing in this data is coded as NA so I formatted them in SAS as:
data Yr2; set Yr1; if X1= 'NA' then X1= ' '; if X2 = 'NA' then X2 = ' '; if X3 = 'NA' then X3 = ' '; if X4 = 'NA' then X4= ' '; if X5 = 'NA' then X5 = ' '; run; PROC PRINT data=Yr2; run;Then I try to run my regression code:
Title color=red "Forward alpha=0.10"; proc reg data=Yr2; model Wt Count =X1$ X2$ X3$ X4$ X5$/selection=forward SLENTRY=0.10; run;I get this error:
ERROR 22-322: Syntax error, expecting one of the following: a name, ;, -, /, :, _ALL_, _CHARACTER_, _CHAR_, _NUMERIC_, {. ERROR 76-322: Syntax error, statement will be ignored.I think I am doing something wrong in formatting of my variables but I don't know how to fix my code. Because here I am comparing two response variables to Wt and Count the predictors (markers). I want to know significant marker effects in this model as well as Rsq.
Any advice would help.
Hi, I wanted to see 100% of the LOG of PROC GLM, I stated that multiple times. I don't need to see the LOG of the rest of your code, where there are no errors.
There is no SELECTION= in PROC GLM.
If you really have to use SELECTION=, you can do this in PROC REG using numeric variables only; and then the value of 2 will be interpreted as twice the value of 1, as a linear regression will be fit to your x1-x5. Also, missing values will not be used in the regression model. Maybe you want dummy variables for all the levels of your independent variables, so 2 is indicated by its own column and 1 is indicated by its own columns, and NA is indicated by its own column.
Or you can use PROC GLMSELECT, which does allow SELECTION= with CLASS variables. That's probably easier.
Further, if this is your entire data set (as opposed to a small example for illustration purposes), I would not recommend any kind of stepwise regression on 10 observations.
PROC REG can only accept numeric variables. It will also ignore rows that have missing values. This code gives you numeric variables. Then, you can also get rid of the $ in PROC REG.
data have;
input Line$ 1-6 WT Count X1 X2 X3 X4 X5 ;
datalines;
Line 1 54 7 . 2 2 2 0
Line 2 25 5 0 0 0 . 1
Line 3 27 4 0 0 0 0 0
Line 4 40 5 0 0 1 0 0
Line 5 43 6 0 0 2 2 0
Line 6 27 5 0 0 0 0 0
Line 7 34 6 0 0 2 0 2
Line 8 32 6 0 0 2 1 2
Line 9 42 5 0 0 2 2 0
Line 10 35 5 0 0 0 0 0
;
HOWEVER: A BETTER APPROACH
Use PROC GLM with a CLASS statement, then you can use character variables. One of the problems of using PROC REG is that 2 is considered twice the value of 1, while in PROC GLM with a CLASS statement, then you have 1 and 2 are distinct levels, but 2 is not twice the value of 1, and when a variable is equal to 'NA' is not removed from the analysis.
proc glm;
class x1 - x5;
model wt count= x1-x5;
run;
quit;
So, leave the variables as character and use PROC GLM.
Hi Paige
So I did it like this:
proc glm; class x1 x2 x3 x4 x5; model wt count=x1 x2 x3 x4 x5/selection=forward SLENTRY=0.10;; run;
But I am getting this:
```
ERROR 22-322: Syntax error, expecting one of the following: ;, ALIASING, ALPHA, CLI, CLM, CLPARM, COVBYCLASS, E, E1, E2, E3, E4, EST, I, INTERCEPT, INVERSE, NOINT, NOUNI, P, PREDICTED, SINGULAR, SOLUTION, SS1, SS2, SS3, SS4, TOLERANCE, X, XPX, ZETA. ERROR 76-322: Syntax error, statement will be ignored.
```
Also what is "y1-y5" in your response?
Please do not show us errors in the LOG detached from the code. Do not choose parts of the LOG of PROC GLM to show us, and not show us other parts of the LOG of PROC GLM.
Please do show us the entire LOG for PROC GLM, all of it, 100% with nothing chopped out.
y1-y5 was a typographical error, and has been corrected.
Sorry about that here is the log:
OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 72 73 data Yr1; 74 input LINE$ WT COUNT X1$ X2$ X3$ X4$ X5$ ; 75 datalines; NOTE: SAS went to a new line when INPUT statement reached past the end of a line. NOTE: The data set WORK.YR1 has 10 observations and 8 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.00 seconds memory 784.56k OS Memory 38056.00k Timestamp 04/24/2021 09:28:42 PM Step Count 366 Switch Count 2 Page Faults 0 Page Reclaims 90 Page Swaps 0 Voluntary Context Switches 13 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 264 87 ; 88 run; 89 90 PROC PRINT data=Yr1; run; NOTE: There were 10 observations read from the data set WORK.YR1. NOTE: PROCEDURE PRINT used (Total process time): real time 0.04 seconds user cpu time 0.04 seconds system cpu time 0.01 seconds memory 2434.81k OS Memory 38056.00k Timestamp 04/24/2021 09:28:42 PM Step Count 367 Switch Count 0 Page Faults 0 Page Reclaims 63 Page Swaps 0 Voluntary Context Switches 0 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 8 91 92 data Yr2; 93 set Yr1; 94 if X1 = 'NA' then X1 = ' '; 95 if X2 = 'NA' then X2 = ' '; 96 if X3 = 'NA' then X3 = ' '; 97 if X4 = 'NA' then X4 = ' '; 98 if X5 = 'NA' then X5 = ' '; 99 run; NOTE: There were 10 observations read from the data set WORK.YR1. NOTE: The data set WORK.YR2 has 10 observations and 8 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.00 seconds memory 1064.18k OS Memory 38316.00k Timestamp 04/24/2021 09:28:42 PM Step Count 368 Switch Count 2 Page Faults 0 Page Reclaims 125 Page Swaps 0 Voluntary Context Switches 17 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 264 100 101 PROC PRINT data=Yr2; run; NOTE: There were 10 observations read from the data set WORK.YR2. NOTE: PROCEDURE PRINT used (Total process time): real time 0.02 seconds user cpu time 0.03 seconds system cpu time 0.00 seconds memory 978.15k OS Memory 38056.00k Timestamp 04/24/2021 09:28:42 PM Step Count 369 Switch Count 0 Page Faults 0 Page Reclaims 60 Page Swaps 0 Voluntary Context Switches 0 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 8 101 ! 102 103 104 Title color=red "Forward alpha=0.10"; 105 proc glm; 106 class x1 x2 x3 x4 x5; 107 model wt count=x1 x2 x3 x4 x5/selection=forward SLENTRY=0.10;; _________ 22 76 NOTE: The previous statement has been deleted. ERROR 22-322: Syntax error, expecting one of the following: ;, ALIASING, ALPHA, CLI, CLM, CLPARM, COVBYCLASS, E, E1, E2, E3, E4, EST, I, INTERCEPT, INVERSE, NOINT, NOUNI, P, PREDICTED, SINGULAR, SOLUTION, SS1, SS2, SS3, SS4, TOLERANCE, X, XPX, ZETA. ERROR 76-322: Syntax error, statement will be ignored. 108 run; 109 110 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 122
FYI I want to do variable selection of the model using forward and backward reg.
Hi, I wanted to see 100% of the LOG of PROC GLM, I stated that multiple times. I don't need to see the LOG of the rest of your code, where there are no errors.
There is no SELECTION= in PROC GLM.
If you really have to use SELECTION=, you can do this in PROC REG using numeric variables only; and then the value of 2 will be interpreted as twice the value of 1, as a linear regression will be fit to your x1-x5. Also, missing values will not be used in the regression model. Maybe you want dummy variables for all the levels of your independent variables, so 2 is indicated by its own column and 1 is indicated by its own columns, and NA is indicated by its own column.
Or you can use PROC GLMSELECT, which does allow SELECTION= with CLASS variables. That's probably easier.
Further, if this is your entire data set (as opposed to a small example for illustration purposes), I would not recommend any kind of stepwise regression on 10 observations.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.