BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
hongjie76
Calcite | Level 5

I tried to create a dataset of 200 individuals. There are 4 variables in the dataset, y, x1, x2, and x3. The value of y is predicted from x1, x2, and x3. There are also some correlations among x1, x2, and x3. I created the following SAS syntax. However, I got error messages and could not create the dataset. Would you please help me to see what went wrong here? Thank you in advance!

 

/* Set the number of individuals */
%let num_individuals = 200;

 

/* Set the correlation matrix */
%let correlation_matrix = 1, 0.5, 0.3,
                                                0.5, 1, 0.2,
                                               0.3, 0.2, 1;

/* Create the dataset */
data my_dataset;
array x[3] x1-x3;
call streaminit(12345); /* Set the seed for random number generation */

/* Generate correlated values for x1, x2, and x3 */


do i = 1 to &num_individuals;
x = rand("Multinormal", 0, &_correlation_matrix); /* Generate correlated values */
x1 = x[1];
x2 = x[2];
x3 = x[3];

/* Calculate the value of y using x1, x2, and x3 */
y = 2 * x1 + 3 * x2 - 4 * x3 + rand("Normal", 0, 0.5); /* Add some random noise to the prediction */

output; /* Output the current observation */
end;


keep y x1 x2 x3; /* Keep only the specified variables */
run;

 

/* Print the dataset */
proc print data=my_dataset;
run;

 

I got the following error message:

 

214 %let num_individuals = 200;
215
216 /* Set the correlation matrix */
217 %let correlation_matrix = 1, 0.5, 0.3,
218 0.5, 1, 0.2,
219 0.3, 0.2, 1;
220
221 /* Create the dataset */
222 data my_dataset;
223 array x[3] x1-x3;
224 call streaminit(12345); /* Set the seed for random number generation */
225
226 /* Generate correlated values for x1, x2, and x3 */
227 do i = 1 to &num_individuals;
228 x = rand("Multinormal", 0, &_correlation_matrix); /* Generate correlated values */
-
22
WARNING: Apparent symbolic reference _CORRELATION_MATRIX not resolved.
ERROR: Illegal reference to the array x.
ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string,
a numeric constant, a datetime constant, a missing value, INPUT, PUT.

229 x1 = x[1];
230 x2 = x[2];
231 x3 = x[3];
232
233 /* Calculate the value of y using x1, x2, and x3 */
234 y = 2 * x1 + 3 * x2 - 4 * x3 + rand("Normal", 0, 0.5); /* Add some random noise to the
234! prediction */
235
236 output; /* Output the current observation */
237 end;
238 keep y x1 x2 x3; /* Keep only the specified variables */
239 run;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.MY_DATASET may be incomplete. When this step was stopped there were
0 observations and 4 variables.
WARNING: Data set WORK.MY_DATASET was not replaced because this step was stopped.

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

If you don't have access to IML, you can use this technique (Simulate multivariate normal data in SAS by using PROC SIMNORMAL) described by Rick Wicklin.  It uses a DATA step, plus proc simnormal to generate a multinormal distribution for the independent variables.  In your case it would be something like

 

data havecorr (type='CORR');
  input _TYPE_ $4.  @7 _NAME_ $4.  @10 x1 x2 x3 ;
datalines;
MEAN       0    0    0
STD        1    1    1
N          200 200 200
CORR   X1  1    0.5  0.3
CORR   X2  0.5  1    0.2
CORR   X3  0.3  0.2  1
run;

proc simnormal data=havecorr outsim=SimMVN
               numreal = 200           /* number of realizations = size of sample */
               seed = 12345  ;         /* random number seed */
   var x1-x3;
run;

Then, from dataset SimMVN, you can simulate Y from the generated X values.

 

Note, per Rick's comment, you can directly generate (using, say, PROC CORR), the HAVECORR dataset from original correlated data.

 

You can learn more about the proc at The SIMNORMAL Procedure 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

6 REPLIES 6
HarrySnart
SAS Employee
Hi, I've not debugged the rest of the code but it looks like you've got a typo when calling &correlation_matrix. You need to add an underscore to your let statement
hongjie76
Calcite | Level 5

Thanks! After I changed the let statement to '%let _correlation_matrix ', I still got the error message:

"ERROR: Illegal reference to the array x."

 

1 /* Set the number of individuals */
2 %let num_individuals = 200;
3
4
5
6 /* Set the correlation matrix */
7 %let _correlation_matrix = 1, 0.5, 0.3,
8 0.5, 1, 0.2,
9 0.3, 0.2, 1;
10
11 /* Create the dataset */
12 data my_dataset;
13 array x[3] x1-x3;
14 call streaminit(12345); /* Set the seed for random number generation */
15
16 /* Generate correlated values for x1, x2, and x3 */
17
18
19 do i = 1 to &num_individuals;
20 x = rand("Multinormal", 0, &_correlation_matrix); /* Generate correlated values */
ERROR: Illegal reference to the array x.
21 x1 = x[1];
22 x2 = x[2];
23 x3 = x[3];
24
25 /* Calculate the value of y using x1, x2, and x3 */
26 y = 2 * x1 + 3 * x2 - 4 * x3 + rand("Normal", 0, 0.5); /* Add some random noise to the
26 ! prediction */
27
28 output; /* Output the current observation */
29 end;
30
31
32 keep y x1 x2 x3; /* Keep only the specified variables */
33 run;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.MY_DATASET may be incomplete. When this step was stopped there were
0 observations and 4 variables.

PaigeMiller
Diamond | Level 26
20 x = rand("Multinormal", 0, &_correlation_matrix); /* Generate correlated values */
ERROR: Illegal reference to the array x.

I'm not sure where you got this syntax from, but a search of the documentation for SAS does not turn up a random number generator that has the distribution "Multinormal". There is the RANDNORMAL function in PROC IML, if that would be of help to you.

--
Paige Miller
Reeza
Super User

I think you're mixing IML and data step code. 

Also you have an array labeled X and a variable X which isn't going to work.

 

 

 

 

mkeintz
PROC Star

If you don't have access to IML, you can use this technique (Simulate multivariate normal data in SAS by using PROC SIMNORMAL) described by Rick Wicklin.  It uses a DATA step, plus proc simnormal to generate a multinormal distribution for the independent variables.  In your case it would be something like

 

data havecorr (type='CORR');
  input _TYPE_ $4.  @7 _NAME_ $4.  @10 x1 x2 x3 ;
datalines;
MEAN       0    0    0
STD        1    1    1
N          200 200 200
CORR   X1  1    0.5  0.3
CORR   X2  0.5  1    0.2
CORR   X3  0.3  0.2  1
run;

proc simnormal data=havecorr outsim=SimMVN
               numreal = 200           /* number of realizations = size of sample */
               seed = 12345  ;         /* random number seed */
   var x1-x3;
run;

Then, from dataset SimMVN, you can simulate Y from the generated X values.

 

Note, per Rick's comment, you can directly generate (using, say, PROC CORR), the HAVECORR dataset from original correlated data.

 

You can learn more about the proc at The SIMNORMAL Procedure 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
hongjie76
Calcite | Level 5

Thank you all for your help. The problem has been solved. 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1029 views
  • 2 likes
  • 5 in conversation