I tried to create a dataset of 200 individuals. There are 4 variables in the dataset, y, x1, x2, and x3. The value of y is predicted from x1, x2, and x3. There are also some correlations among x1, x2, and x3. I created the following SAS syntax. However, I got error messages and could not create the dataset. Would you please help me to see what went wrong here? Thank you in advance!
/* Set the number of individuals */
%let num_individuals = 200;
/* Set the correlation matrix */
%let correlation_matrix = 1, 0.5, 0.3,
0.5, 1, 0.2,
0.3, 0.2, 1;
/* Create the dataset */
data my_dataset;
array x[3] x1-x3;
call streaminit(12345); /* Set the seed for random number generation */
/* Generate correlated values for x1, x2, and x3 */
do i = 1 to &num_individuals;
x = rand("Multinormal", 0, &_correlation_matrix); /* Generate correlated values */
x1 = x[1];
x2 = x[2];
x3 = x[3];
/* Calculate the value of y using x1, x2, and x3 */
y = 2 * x1 + 3 * x2 - 4 * x3 + rand("Normal", 0, 0.5); /* Add some random noise to the prediction */
output; /* Output the current observation */
end;
keep y x1 x2 x3; /* Keep only the specified variables */
run;
/* Print the dataset */
proc print data=my_dataset;
run;
I got the following error message:
214 %let num_individuals = 200;
215
216 /* Set the correlation matrix */
217 %let correlation_matrix = 1, 0.5, 0.3,
218 0.5, 1, 0.2,
219 0.3, 0.2, 1;
220
221 /* Create the dataset */
222 data my_dataset;
223 array x[3] x1-x3;
224 call streaminit(12345); /* Set the seed for random number generation */
225
226 /* Generate correlated values for x1, x2, and x3 */
227 do i = 1 to &num_individuals;
228 x = rand("Multinormal", 0, &_correlation_matrix); /* Generate correlated values */
-
22
WARNING: Apparent symbolic reference _CORRELATION_MATRIX not resolved.
ERROR: Illegal reference to the array x.
ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string,
a numeric constant, a datetime constant, a missing value, INPUT, PUT.
229 x1 = x[1];
230 x2 = x[2];
231 x3 = x[3];
232
233 /* Calculate the value of y using x1, x2, and x3 */
234 y = 2 * x1 + 3 * x2 - 4 * x3 + rand("Normal", 0, 0.5); /* Add some random noise to the
234! prediction */
235
236 output; /* Output the current observation */
237 end;
238 keep y x1 x2 x3; /* Keep only the specified variables */
239 run;
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.MY_DATASET may be incomplete. When this step was stopped there were
0 observations and 4 variables.
WARNING: Data set WORK.MY_DATASET was not replaced because this step was stopped.
If you don't have access to IML, you can use this technique (Simulate multivariate normal data in SAS by using PROC SIMNORMAL) described by Rick Wicklin. It uses a DATA step, plus proc simnormal to generate a multinormal distribution for the independent variables. In your case it would be something like
data havecorr (type='CORR');
input _TYPE_ $4. @7 _NAME_ $4. @10 x1 x2 x3 ;
datalines;
MEAN 0 0 0
STD 1 1 1
N 200 200 200
CORR X1 1 0.5 0.3
CORR X2 0.5 1 0.2
CORR X3 0.3 0.2 1
run;
proc simnormal data=havecorr outsim=SimMVN
numreal = 200 /* number of realizations = size of sample */
seed = 12345 ; /* random number seed */
var x1-x3;
run;
Then, from dataset SimMVN, you can simulate Y from the generated X values.
Note, per Rick's comment, you can directly generate (using, say, PROC CORR), the HAVECORR dataset from original correlated data.
You can learn more about the proc at The SIMNORMAL Procedure
Thanks! After I changed the let statement to '%let _correlation_matrix ', I still got the error message:
"ERROR: Illegal reference to the array x."
1 /* Set the number of individuals */
2 %let num_individuals = 200;
3
4
5
6 /* Set the correlation matrix */
7 %let _correlation_matrix = 1, 0.5, 0.3,
8 0.5, 1, 0.2,
9 0.3, 0.2, 1;
10
11 /* Create the dataset */
12 data my_dataset;
13 array x[3] x1-x3;
14 call streaminit(12345); /* Set the seed for random number generation */
15
16 /* Generate correlated values for x1, x2, and x3 */
17
18
19 do i = 1 to &num_individuals;
20 x = rand("Multinormal", 0, &_correlation_matrix); /* Generate correlated values */
ERROR: Illegal reference to the array x.
21 x1 = x[1];
22 x2 = x[2];
23 x3 = x[3];
24
25 /* Calculate the value of y using x1, x2, and x3 */
26 y = 2 * x1 + 3 * x2 - 4 * x3 + rand("Normal", 0, 0.5); /* Add some random noise to the
26 ! prediction */
27
28 output; /* Output the current observation */
29 end;
30
31
32 keep y x1 x2 x3; /* Keep only the specified variables */
33 run;
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.MY_DATASET may be incomplete. When this step was stopped there were
0 observations and 4 variables.
20 x = rand("Multinormal", 0, &_correlation_matrix); /* Generate correlated values */ ERROR: Illegal reference to the array x.
I'm not sure where you got this syntax from, but a search of the documentation for SAS does not turn up a random number generator that has the distribution "Multinormal". There is the RANDNORMAL function in PROC IML, if that would be of help to you.
I think you're mixing IML and data step code.
Also you have an array labeled X and a variable X which isn't going to work.
If you don't have access to IML, you can use this technique (Simulate multivariate normal data in SAS by using PROC SIMNORMAL) described by Rick Wicklin. It uses a DATA step, plus proc simnormal to generate a multinormal distribution for the independent variables. In your case it would be something like
data havecorr (type='CORR');
input _TYPE_ $4. @7 _NAME_ $4. @10 x1 x2 x3 ;
datalines;
MEAN 0 0 0
STD 1 1 1
N 200 200 200
CORR X1 1 0.5 0.3
CORR X2 0.5 1 0.2
CORR X3 0.3 0.2 1
run;
proc simnormal data=havecorr outsim=SimMVN
numreal = 200 /* number of realizations = size of sample */
seed = 12345 ; /* random number seed */
var x1-x3;
run;
Then, from dataset SimMVN, you can simulate Y from the generated X values.
Note, per Rick's comment, you can directly generate (using, say, PROC CORR), the HAVECORR dataset from original correlated data.
You can learn more about the proc at The SIMNORMAL Procedure
Thank you all for your help. The problem has been solved.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.