I'm brand new to SAS, trying to use it to generate random values for four, normally-distributed, collinear variables. A colleague of mine has prepared the following code, but it throws "ERROR: COV matrix is incomplete in data set."
Data RandomValueGeneratorStatistics (type=COV) ; input _TYPE_ $ _NAME_ $ variable1 variable2 variable3 variable4; datalines ; COV variable1 0.739191357 0.276109171 0.100056621 470.1092606 COV variable2 0.876109171 0.327432304 0.648272489 0.611948925 COV variable3 0.900056621 0.848272489 0.163812314 0.558222994 COV variable4 470.1092606 0.611948925 0.358222994 5474224.269 MEAN 4.217123172 41.69388108 4.893316488 8606.147733 ; run; Proc Simnorm data=RandomValueGeneratorStatistics outsim=ssim numreal = 1000 seed = 54321 ; var variable1 variable2 variable3 variable4 ; run;
Using version 6.100.0.2870. Any ideas as to why? Please share. Thanks for your help!
This is the code I have based on what you've posted so far:
Data RandomValueGeneratorStatistics (type=COV) ;
input _TYPE_ $ _NAME_ $9. variable1 variable2 variable3 variable4;
datalines ;
COV variable1 0.999368288 0.009398213 0.075083647 0.453177098
COV variable2 0.009398213 0.000155024 0.000878636 0.006372190
COV variable3 0.075083647 0.000878636 0.008894056 0.043200581
COV variable4 0.453177098 0.006372190 0.043200581 0.999368288
MEAN 3.217123172 42.69388108 3.893316488 8605.147733
;
run;
Proc Simnorm data=RandomValueGeneratorStatistics outsim=ssim
numreal = 1000
seed = 54321 ;
var variable1 variable2 variable3 variable4 ;
run;
... And some of the results:
Obs variable1 variable2 variable3 variable4 Rnum
1 3.75650 42.7002 3.99283 8607.08 1
2 3.74764 42.7058 3.86268 8606.20 2
3 2.67129 42.6923 3.86041 8603.92 3
4 1.81036 42.6811 3.78579 8604.83 4
5 3.06134 42.6897 3.89000 8604.87 5
6 3.08770 42.6886 3.92196 8604.73 6
7 4.44447 42.6934 3.87695 8605.65 7
8 2.72725 42.6790 3.90519 8604.83 8
9 3.60691 42.6917 3.88077 8604.06 9
10 4.04193 42.7055 3.95174 8604.69 10
11 2.17955 42.6804 3.80562 8604.45 11
12 3.15123 42.6909 3.94730 8605.94 12
Is this what you are looking for?
I should say EG 6.
Is it possible that the variable names as specified are too long, and so causing the matrix to be read-in incorrectly?
Hello.
So there are a few things going on here.
The first thing I noticed is that after running a Proc Print, your _Name_ variable is being truncated as you can see below.
Obs _TYPE_ _NAME_ variable1 variable2 variable3 variable4 1 COV variable 0.739 0.27611 0.10006 470.11 2 COV variable 0.876 0.32743 0.64827 0.61 3 COV variable 0.900 0.84827 0.16381 0.56 4 COV variable 470.109 0.61195 0.35822 5474224.27
To fix this issue, specify a length for the variable ($10. for example)
Data RandomValueGeneratorStatistics (type=COV) ;
input _TYPE_ $ _NAME_ $10. variable1 variable2 variable3 variable4;
datalines ;
COV variable1 0.739191357 0.276109171 0.100056621 470.1092606
COV variable2 0.876109171 0.327432304 0.648272489 0.611948925
COV variable3 0.900056621 0.848272489 0.163812314 0.558222994
COV variable4 470.1092606 0.611948925 0.358222994 5474224.269
MEAN 4.217123172 41.69388108 4.893316488 8606.147733
;
run;
After making this change and running a Proc Print the output now looks like this:
Obs _TYPE_ _NAME_ variable1 variable2 variable3 variable4 1 COV variable1 0.739 0.2761 0.10006 470.11 2 COV variable2 0.876 0.3274 0.64827 0.61 3 COV variable3 0.900 0.8483 0.16381 0.56 4 COV variable4 470.109 0.6119 0.35822 5474224.27 5 MEAN 4.217 41.6939 4.89332 8606.15
Making this change will fix the error you are currently getting but running the SIMNORM procedure will still produce other errors. I'm not that familiar with the SIMNORM procedure but I believe that you need to make sure that your matrix is both symmetric and positive definite (Your current input matrix does not satisfy these parameters).
I hope this at least helps you get going in the right direction
Thanks for your help @jdwaterman91.
I adjusted the variable length parameter and made sure the matrix I'm actually using is symmetric (I had mistyped some entries).
What is meant by 'positive-definite'? Is this a reason, that my matrix is not such, I mean, that I still throw "ERROR: COV matrix is incomplete in data set?" Thanks for your help.
A matrix is positive-definite if all of its eigenvalues are positive.
The SAS Log would write the following error message if your matrix was not positive-definite.
ERROR: Invalid covariance or conditional covariance matrix; matrix is not positive definite.
As far as making sure that your matrix satisfies these parameters for your purposes, there are experts in this community that are far more experienced than I and would probably do a lot better of a job explaining what exactly you need to do/possibly provide some alternative methods for achieving your result. I will defer to them.
Appreciate your help very much, thanks @jdwaterman91.
Am now using this matrix:
COV variable1 0.999368288 0.009398213 0.075083647 0.453177098 COV variable2 0.009398213 0.000155024 0.000878636 0.006372190 COV variable3 0.075083647 0.000878636 0.008894056 0.043200581 COV variable4 0.453177098 0.006372190 0.043200581 0.999368288 MEAN 3.217123172 42.69388108 3.893316488 8605.147733
receive 'ERROR: COV matrix incomplete in data set'.
Is it possible that certain of the matrix items are so small that they are interpreted incorrectly, (e.g. 0.000155024)?
Due to your spacing try setting the length of the Name Variable = to $9. instead of $10.
I did do so..no luck unfortunately.
This is the code I have based on what you've posted so far:
Data RandomValueGeneratorStatistics (type=COV) ;
input _TYPE_ $ _NAME_ $9. variable1 variable2 variable3 variable4;
datalines ;
COV variable1 0.999368288 0.009398213 0.075083647 0.453177098
COV variable2 0.009398213 0.000155024 0.000878636 0.006372190
COV variable3 0.075083647 0.000878636 0.008894056 0.043200581
COV variable4 0.453177098 0.006372190 0.043200581 0.999368288
MEAN 3.217123172 42.69388108 3.893316488 8605.147733
;
run;
Proc Simnorm data=RandomValueGeneratorStatistics outsim=ssim
numreal = 1000
seed = 54321 ;
var variable1 variable2 variable3 variable4 ;
run;
... And some of the results:
Obs variable1 variable2 variable3 variable4 Rnum
1 3.75650 42.7002 3.99283 8607.08 1
2 3.74764 42.7058 3.86268 8606.20 2
3 2.67129 42.6923 3.86041 8603.92 3
4 1.81036 42.6811 3.78579 8604.83 4
5 3.06134 42.6897 3.89000 8604.87 5
6 3.08770 42.6886 3.92196 8604.73 6
7 4.44447 42.6934 3.87695 8605.65 7
8 2.72725 42.6790 3.90519 8604.83 8
9 3.60691 42.6917 3.88077 8604.06 9
10 4.04193 42.7055 3.95174 8604.69 10
11 2.17955 42.6804 3.80562 8604.45 11
12 3.15123 42.6909 3.94730 8605.94 12
Is this what you are looking for?
Precisely the code I have! I'm not sure why it's not working for me the same. That's exactly the result I'm looking for!
What error message is the log giving you?
Same as before: "ERROR: COV matrix incomplete in data set..."
My variable names are actually written with capital 'V'. Could that be a problem? I'll just copy-paste the code you kindly provided before and try..
COV matrix should be symmetric . And why not use IML ? Data RandomValueGeneratorStatistics (type=COV) ; input _TYPE_ $ _NAME_ $ variable1 variable2 variable3 variable4; datalines ; COV variable1 0.739191357 0.276109171 0.100056621 470.1092606 COV variable2 0.876109171 0.327432304 0.648272489 0.611948925 COV variable3 0.900056621 0.848272489 0.163812314 0.558222994 COV variable4 470.1092606 0.611948925 0.358222994 5474224.269 MEAN . 4.217123172 41.69388108 4.893316488 8606.147733 ; run; proc iml; use RandomValueGeneratorStatistics; read all var{variable1 variable2 variable3 variable4} where(_TYPE_='MEAN') into mean; read all var{variable1 variable2 variable3 variable4} where(_TYPE_='COV') into cov; close; n=1000; call randseed(123456789); x=randnormal(n,mean,cov); create want from x; append from x; close; quit; proc corr data=want cov; run;
Thanks for your help @Ksharp.
The matrix I'm actually using is in fact symmetric now. I don't have an IML license unfortunately. Any more ideas? Thanks for your help.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.