BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Sic
Calcite | Level 5 Sic
Calcite | Level 5

I'm trying to computer the two way interaction of the independent variables.

Below is part of the code that puts the variables E1,E2,E3,E4 and G1 to G15 into arrays:

/*Then we need to computer the two way interaction of the independent variables.*/
data new1;
	set new;
	array ONE[*] E1-E4 G1-G15;
	array TWO[*] 
e1e2	e1e3	e1e4	e1g1	e1g2	e1g3	e1g4	e1g5	e1g6	e1g7	e1g8	e1g9	e1g10	e1g11	e1g12	e1g13	e1g14	e1g15		
	e2e3	e2e4	e2g1	e2g2	e2g3	e2g4	e2g5	e2g6	e2g7	e2g8	e2g9	e2g10	e2g11	e2g12	e2g13	e2g14	e2g15	
		e3e4	e3g1	e3g2	e3g3	e3g4	e3g5	e3g6	e3g7	e3g8	e3g9	e3g10	e3g11	e3g12	e3g13	e3g14	e3g15		
			e4g1	e4g2	e4g3	e4g4	e4g5	e4g6	e4g7	e4g8	e4g9	e4g10	e4g11	e4g12	e4g13	e4g14	e4g15
				g1g2	g1g3	g1g4	g1g5	g1g6	g1g7	g1g8	g1g9	g1g10	g1g11	g1g12	g1g13	g1g14	g1g15
					g2g3	g2g4	g2g5	g2g6	g2g7	g2g8	g2g9	g2g10	g2g11	g2g12	g2g13	g2g14	g2g15	
						g3g4	g3g5	g3g6	g3g7	g3g8	g3g9	g3g10	g3g11	g3g12	g3g13	g3g14	g3g15
							g4g5	g4g6	g4g7	g4g8	g4g9	g4g10	g4g11	g4g12	g4g13	g4g14	g4g15	
								g5g6	g5g7	g5g8	g5g9	g5g10	g5g11	g5g12	g5g13	g5g14	g5g15
									g6g7	g6g8	g6g9	g6g10	g6g11	g6g12	g6g13	g6g14	g6g15	
										g7g8	g7g9	g7g10	g7g11	g7g12	g7g13	g7g14	g7g15	
											g8g9	g8g10	g8g11	g8g12	g8g13	g8g14	g8g15		
												g9g10	g9g11	g9g12	g9g13	g9g14	g9g15
													g10g11	g10g12	g10g13	g10g14	g10g15
														g11g12	g11g13	g11g14	g11g15
															g12g13	g12g14	g12g15
																g13g14	g13g15
																	g14g15
n = 0;
do i = 1 to dim(ONE);
	do j = i+1 to dim(ONE);
		n = n+1;
		TWO(n)= ONE(i)*ONE(j);
	end;
end;
run;

After n =0;

This error is returned : 

ERROR 22-322: Syntax error, expecting one of the following: a name, (, -, :, ;, _ALL_,
_CHARACTER_, _CHAR_, _NUMERIC_.

ERROR 76-322: Syntax error, statement will be ignored.

 

After 

TWO(n)= ONE(i)*ONE(j);

 returns 

ERROR: Too many array subscripts specified for array TWO.

 

What am I doing wrong?

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Aside from the missing semi-colon for the ARRAY TWO statement as mentioned by Tom, I have these observations on the messages you get.  It says "syntax error. statement will be ignored".  That means the "array two" statement is ignored.  So the subsequent attempt to use the expression two{n} generates the "too many subscripts" messages.  All due to the missing semicolon.

 

But I actually would like you to consider an alternative based on these observations.

  

  1. You obviously want the cross-products of ONE put into the upper triangle of a 2-dimensional matrix.  But you've defined that upper triangle as a one-dimensional array.  Instead make the calculation easier by defining a 2-dimensional array ("array two{19,19"}.  You can still calculate only the upper triangle.  You don't need to use the unintuitive N to index array TWO.  Just use a row index and a column index.

  2. Use the power of SAS to generate variable name lists to minimize your risk of typing error   (e.g.   g3g4-g3g15 generates all the names from g3g4, g3g5, g3g6 .... g3g14, g3g15).

  3. Use the names DUM1, DUM2 ... etc. for the lower triangle and diagonal elements of the matrix.  Then you can drop them from output with a very simple statement.  Note you can reuse the same name for multiple elements of the array.  SAS doesn't care.

 

data new1;
  set new;
  array ONE[*] E1-E4 G1-G15;
  array TWO[19,19]
     dum1       e1e2-e1e4   e1g1-e1g15		
     dum1-dum2  e2e3-e2e4   e2g1-e2g15	
     dum1-dum3  e3e4        e3g1-e3g15		
     dum1-dum4              e4g1-e4g15
     dum1-dum5              g1g2-g1g15
     dum1-dum6              g2g3-g2g15	
     dum1-dum7              g3g4-g3g15
     dum1-dum8              g4g5-g4g15	
     dum1-dum9              g5g6-g5g15
     dum1-dum10             g6g7-g6g15	
     dum1-dum11             g7g8-g7g15	
     dum1-dum12             g8g9-g8g15		
     dum1-dum13             g9g10-g9g15
     dum1-dum14             g10g11-g10g15
     dum1-dum15             g11g12-g11g15
     dum1-dum16             g12g13-g12g15
     dum1-dum17             g13g14-g13g15
     dum1-dum18             g14g15 ;

  do r=1 to dim(one);
    do c=r+1 to dim(one);
      two{r,c}=one{r}*one{c};
    end;
  end;

  drop dum1-dum18 r c;
run;

 

 

And out of curiosity why aren't you getting the squares as well as the cross-products?

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

10 REPLIES 10
Tom
Super User Tom
Super User

Your second array statement is a piece of art, but it is missing the semi-colon to end the statement.

 

mkeintz
PROC Star

Aside from the missing semi-colon for the ARRAY TWO statement as mentioned by Tom, I have these observations on the messages you get.  It says "syntax error. statement will be ignored".  That means the "array two" statement is ignored.  So the subsequent attempt to use the expression two{n} generates the "too many subscripts" messages.  All due to the missing semicolon.

 

But I actually would like you to consider an alternative based on these observations.

  

  1. You obviously want the cross-products of ONE put into the upper triangle of a 2-dimensional matrix.  But you've defined that upper triangle as a one-dimensional array.  Instead make the calculation easier by defining a 2-dimensional array ("array two{19,19"}.  You can still calculate only the upper triangle.  You don't need to use the unintuitive N to index array TWO.  Just use a row index and a column index.

  2. Use the power of SAS to generate variable name lists to minimize your risk of typing error   (e.g.   g3g4-g3g15 generates all the names from g3g4, g3g5, g3g6 .... g3g14, g3g15).

  3. Use the names DUM1, DUM2 ... etc. for the lower triangle and diagonal elements of the matrix.  Then you can drop them from output with a very simple statement.  Note you can reuse the same name for multiple elements of the array.  SAS doesn't care.

 

data new1;
  set new;
  array ONE[*] E1-E4 G1-G15;
  array TWO[19,19]
     dum1       e1e2-e1e4   e1g1-e1g15		
     dum1-dum2  e2e3-e2e4   e2g1-e2g15	
     dum1-dum3  e3e4        e3g1-e3g15		
     dum1-dum4              e4g1-e4g15
     dum1-dum5              g1g2-g1g15
     dum1-dum6              g2g3-g2g15	
     dum1-dum7              g3g4-g3g15
     dum1-dum8              g4g5-g4g15	
     dum1-dum9              g5g6-g5g15
     dum1-dum10             g6g7-g6g15	
     dum1-dum11             g7g8-g7g15	
     dum1-dum12             g8g9-g8g15		
     dum1-dum13             g9g10-g9g15
     dum1-dum14             g10g11-g10g15
     dum1-dum15             g11g12-g11g15
     dum1-dum16             g12g13-g12g15
     dum1-dum17             g13g14-g13g15
     dum1-dum18             g14g15 ;

  do r=1 to dim(one);
    do c=r+1 to dim(one);
      two{r,c}=one{r}*one{c};
    end;
  end;

  drop dum1-dum18 r c;
run;

 

 

And out of curiosity why aren't you getting the squares as well as the cross-products?

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Sic
Calcite | Level 5 Sic
Calcite | Level 5

I tried doing it your way and it gave ERROR: Too few variables defined for the dimension(s) specified for the array TWO.

What should I do?

 

Also is this code correct if I want to use the stepwise option in SAS procedure Proc Reg to select the reasonable independent variables at significance level of 0.01?

 

proc reg data=new1;
	model Y= E1-E4 G1-G15 
e1e2-e1e4   e1g1-e1g15  
e2e3-e2e4   e2g1-e2g15 
e3e4        e3g1-e3g15  
            e4g1-e4g15
            g1g2-g1g15
            g2g3-g2g15 
            g3g4-g3g15
            g4g5-g4g15 
            g5g6-g5g15
            g6g7-g6g15 
            g7g8-g7g15
	    g8g9-g8g15
            g9g10-g9g15
            g10g11-g10g15
            g11g12-g11g15
            g12g13-g12g15
            g13g14-g13g15
            g14g15 ;
																
/selection=stepwise SLENTRY=0.01;
plot residual.*predicted.;
run;

Thanks

Reeza
Super User

The plot statement is no longer valid for PROC REG. 

 

How many observations do you have? You need approximately 20-25 for each variable and you have a lot of variables. 

mkeintz
PROC Star

Show the log please, including the error message.  Can't diagnose what can't be seen.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Sic
Calcite | Level 5 Sic
Calcite | Level 5
 NOTE: There were 2294 observations read from the data set WORK.Y.
NOTE: PROCEDURE GPLOT used (Total process time):
real time 0.15 seconds
cpu time 0.14 seconds


119 data new1;
120 set new;
121 array ONE[*] E1-E4 G1-G15;
122 array TWO[19,19]
123 dum1 e1e2-e1e4 e1g1-e1g15
124 dum1-dum2 e2e3-e2e4 e2g1-e2g15
125 dum1-dum3 e3e4 e3g1-e3g15
126 dum1-dum4 e4g1-e4g15
127 dum1-dum5 g1g2-g1g15
128 dum1-dum6 g2g3-g2g15
129 dum1-dum7 g3g4-g3g15
130 dum1-dum8 g4g5-g4g15
131 dum1-dum9 g5g6-g5g15
132 dum1-dum10 g6g7-g6g15
133 dum1-dum11 g7g8-g7g15
134 dum1-dum12 g8g9-g8g15
135 dum1-dum13 g9g10-g9g15
136 dum1-dum14 g10g11-g10g15
137 dum1-dum15 g11g12-g11g15
138 dum1-dum16 g12g13-g12g15
139 dum1-dum17 g13g14-g13g15
140 dum1-dum18 g14g15 ;
ERROR: Too few variables defined for the dimension(s) specified for the array TWO.
141
142 do r=1 to dim(one);
143 do c=r+1 to dim(one);
144 two{r,c}=one{r}*one{c};
145 end;
146 end;
147
148 drop dum1-dum18 r c;
149 run;

WARNING: Not all variables in the list dum1-dum18 were found.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.NEW1 may be incomplete. When this step was stopped there were 0 observations and 22
variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds


150 /*Then we use the stepwise option in SAS procedure Proc Reg to select the reasonable independent variables at
150! significance level of 0.01*/
151 proc reg data=new1;
152 model Y= E1-E4 G1-G15
153 e1e2-e1e4 e1g1-e1g15
ERROR: Variable E1E2 not found.
154 e2e3-e2e4 e2g1-e2g15
ERROR: Variable E1G1 not found.
ERROR: Variable E2E3 not found.
155 e3e4 e3g1-e3g15
ERROR: Variable E2G1 not found.
ERROR: Variable E3E4 not found.
156 e4g1-e4g15
ERROR: Variable E3G1 not found.
157 g1g2-g1g15
ERROR: Variable E4G1 not found.
158 g2g3-g2g15
ERROR: Variable G1G2 not found.
159 g3g4-g3g15
ERROR: Variable G2G3 not found.
160 g4g5-g4g15
ERROR: Variable G3G4 not found.
161 g5g6-g5g15
ERROR: Variable G4G5 not found.
162 g6g7-g6g15
ERROR: Variable G5G6 not found.
163 g7g8-g7g15
ERROR: Variable G6G7 not found.
164 g8g9-g8g15
ERROR: Variable G7G8 not found.
165 g9g10-g9g15
ERROR: Variable G8G9 not found.
166 g10g11-g10g15
ERROR: Variable G9G10 not found.
167 g11g12-g11g15
ERROR: Variable G10G11 not found.
168 g12g13-g12g15
ERROR: Variable G11G12 not found.
169 g13g14-g13g15
ERROR: Variable G12G13 not found.
170 g14g15 ;
ERROR: Variable G13G14 not found.
ERROR: Variable G14G15 not found.
NOTE: The previous statement has been deleted.
171
172 /selection=stepwise SLENTRY=0.01;
-
180
NOTE: The previous statement has been deleted.
ERROR 180-322: Statement is not valid or it is used out of proper order.
173 plot residual.*predicted.;
NOTE: The previous statement has been deleted.
174 run;

WARNING: RUN statement ignored due to previous errors. Submit QUIT; to terminate the procedure.
NOTE: PROCEDURE REG used (Total process time):
real time 0.15 seconds
cpu time 0.01 seconds

NOTE: The SAS System stopped processing this step because of errors.

 I'm trying to find the model used to generate the data I have, there is one independent variable Y and 19 dependent variable(E1-E4 and G1-G15). 

 

A possible model looks like this: 

Capture111.JPG

I'm trying to use stepwise regression methods to find the important independent variables like E1G1 or G3G4.

 

mkeintz
PROC Star
I see it now. Either declare

array two{18,19}.......;

or add
DUM1-DUM19

to the current array two statement.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Sic
Calcite | Level 5 Sic
Calcite | Level 5

It worked, thanks!

 

The next part gave this error, 

1806  /*Then we use the stepwise option in SAS procedure Proc Reg to select the reasonable independent variables at
1806! significance level of 0.01*/
1807  proc reg data=new1;
1808      model Y= E1-E4 G1-G15
1809  e1e2-e1e4   e1g1-e1g15
1810  e2e3-e2e4   e2g1-e2g15
1811  e3e4        e3g1-e3g15
1812              e4g1-e4g15
1813              g1g2-g1g15
1814              g2g3-g2g15
1815              g3g4-g3g15
1816              g4g5-g4g15
1817              g5g6-g5g15
1818              g6g7-g6g15
1819              g7g8-g7g15
1820              g8g9-g8g15
1821              g9g10-g9g15
1822              g10g11-g10g15
1823              g11g12-g11g15
1824              g12g13-g12g15
1825              g13g14-g13g15
1826              g14g15 ;
1827
1828  /selection=stepwise SLENTRY=0.01;
      -
      180
NOTE: The previous statement has been deleted.
ERROR 180-322: Statement is not valid or it is used out of proper order.
1829  plot residual.*predicted.;
NOTE: The previous statement has been deleted.
1830  run;

 

Can I replace it with the following?

title 'Stepwise Regression on Independent Variables';
   proc logistic data=new1 outest=new2 covout;
      model Y = E1-E4 G1-G15 
                   / selection=stepwise
                     slentry=0.01
                     details
                     lackfit;
      output out=pred p=phat lower=lcl upper=ucl
             predprob=(individual crossvalidate);
   run;
   proc print data=new2;
      title2 'Parameter Estimates and Covariance Matrix';
   run;
   proc print data=pred;
      title2 'Predicted Probabilities and 95% Confidence Limits';
   run;
Reeza
Super User

What proc are you planning to use these interactions? I'm assuming the CLASS variable along with | and @2 to specify two way interactions wasn't an option for so,e reason?

Ksharp
Super User
Why not let PROC do it for you .

proc logistic data=sashelp.class outdesign=want outdesignonly;
model sex=weight|height|age @2;
run;


There are many other PROC can output design matrix.
http://blogs.sas.com/content/iml/2016/02/24/create-a-design-matrix-in-sas.html




sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 5850 views
  • 1 like
  • 5 in conversation