BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
PBG
Obsidian | Level 7 PBG
Obsidian | Level 7

I'm brand new to SAS, trying to use it to generate random values for four, normally-distributed, collinear variables. A colleague of mine has prepared the following code, but it throws "ERROR: COV matrix is incomplete in data set."

 

Data RandomValueGeneratorStatistics (type=COV) ; 
input _TYPE_ $ _NAME_ $ variable1 variable2 variable3 variable4; 
datalines ; 
COV variable1  0.739191357 0.276109171 0.100056621 470.1092606
COV variable2  0.876109171 0.327432304 0.648272489 0.611948925
COV variable3  0.900056621 0.848272489 0.163812314 0.558222994
COV variable4  470.1092606 0.611948925 0.358222994 5474224.269
MEAN           4.217123172 41.69388108 4.893316488 8606.147733
;
run; 

Proc Simnorm data=RandomValueGeneratorStatistics outsim=ssim 
numreal = 1000 
seed = 54321 ; 
var variable1 variable2 variable3 variable4 ; 
run;

 

Using version 6.100.0.2870. Any ideas as to why? Please share. Thanks for your help!

1 ACCEPTED SOLUTION

Accepted Solutions
jdwaterman91
Obsidian | Level 7

This is the code I have based on what you've posted so far:

 

Data RandomValueGeneratorStatistics (type=COV) ; 
input _TYPE_ $ _NAME_ $9. variable1 variable2 variable3 variable4; 
datalines ; 
COV variable1 0.999368288 0.009398213 0.075083647 0.453177098
COV variable2 0.009398213 0.000155024 0.000878636 0.006372190
COV variable3 0.075083647 0.000878636 0.008894056 0.043200581
COV variable4 0.453177098 0.006372190 0.043200581 0.999368288
MEAN          3.217123172 42.69388108 3.893316488 8605.147733
;
run; 

Proc Simnorm data=RandomValueGeneratorStatistics outsim=ssim 
numreal = 1000 
seed = 54321 ; 
var variable1 variable2 variable3 variable4 ; 
run;

... And some of the results:

 

 Obs    variable1    variable2    variable3    variable4    Rnum

1     3.75650      42.7002      3.99283      8607.08       1
2     3.74764      42.7058      3.86268      8606.20       2
3     2.67129      42.6923      3.86041      8603.92       3
4     1.81036      42.6811      3.78579      8604.83       4
5     3.06134      42.6897      3.89000      8604.87       5
6     3.08770      42.6886      3.92196      8604.73       6
7     4.44447      42.6934      3.87695      8605.65       7
8     2.72725      42.6790      3.90519      8604.83       8
9     3.60691      42.6917      3.88077      8604.06       9
10     4.04193      42.7055      3.95174      8604.69      10
11     2.17955      42.6804      3.80562      8604.45      11
12     3.15123      42.6909      3.94730      8605.94      12

Is this what you are looking for?

View solution in original post

14 REPLIES 14
PBG
Obsidian | Level 7 PBG
Obsidian | Level 7

I should say EG 6.

 

Is it possible that the variable names as specified are too long, and so causing the matrix to be read-in incorrectly?

jdwaterman91
Obsidian | Level 7

Hello.

 

So there are a few things going on here.  

The first thing I noticed is that after running a Proc Print, your _Name_ variable is being truncated as you can see below.

 

Obs    _TYPE_     _NAME_     variable1    variable2    variable3     variable4

1      COV      variable       0.739      0.27611      0.10006         470.11
2      COV      variable       0.876      0.32743      0.64827           0.61
3      COV      variable       0.900      0.84827      0.16381           0.56
4      COV      variable     470.109      0.61195      0.35822     5474224.27

 

To fix this issue, specify a length for the variable ($10. for example)

  

Data RandomValueGeneratorStatistics (type=COV) ; 
input _TYPE_ $ _NAME_ $10. variable1 variable2 variable3 variable4; 
datalines ; 
COV variable1  0.739191357 0.276109171 0.100056621 470.1092606
COV variable2  0.876109171 0.327432304 0.648272489 0.611948925
COV variable3  0.900056621 0.848272489 0.163812314 0.558222994
COV variable4  470.1092606 0.611948925 0.358222994 5474224.269
MEAN           4.217123172 41.69388108 4.893316488 8606.147733
;
run;

 

After making this change and running a Proc Print the output now looks like this:

 

Obs    _TYPE_     _NAME_      variable1    variable2    variable3     variable4

1      COV      variable1       0.739       0.2761      0.10006         470.11
2      COV      variable2       0.876       0.3274      0.64827           0.61
3      COV      variable3       0.900       0.8483      0.16381           0.56
4      COV      variable4     470.109       0.6119      0.35822     5474224.27
5      MEAN                     4.217      41.6939      4.89332        8606.15

 

Making this change will fix the error you are currently getting but running the SIMNORM procedure will still produce other errors.  I'm not that familiar with the SIMNORM procedure but I believe that you need to make sure that your matrix is both symmetric and positive definite (Your current input matrix does not satisfy these parameters).

 

I hope this at least helps you get going in the right direction

PBG
Obsidian | Level 7 PBG
Obsidian | Level 7

Thanks for your help @jdwaterman91.

 

I adjusted the variable length parameter and made sure the matrix I'm actually using is symmetric (I had mistyped some entries).

 

What is meant by 'positive-definite'? Is this a reason, that my matrix is not such, I mean, that I still throw "ERROR: COV matrix is incomplete in data set?" Thanks for your help.

jdwaterman91
Obsidian | Level 7

A matrix is positive-definite if all of its eigenvalues are positive.

 

The SAS Log would write the following error message if your matrix was not positive-definite.

 

ERROR: Invalid covariance or conditional covariance matrix; matrix is not positive definite.

As far as making sure that your matrix satisfies these parameters for your purposes, there are experts in this community that are far more experienced than I and would probably do a lot better of a job explaining what exactly you need to do/possibly provide some alternative methods for achieving your result. I will defer to them.

 

PBG
Obsidian | Level 7 PBG
Obsidian | Level 7

Appreciate your help very much, thanks @jdwaterman91.

 

Am now using this matrix:

COV variable1 0.999368288 0.009398213 0.075083647 0.453177098
COV variable2 0.009398213 0.000155024 0.000878636 0.006372190
COV variable3 0.075083647 0.000878636 0.008894056 0.043200581
COV variable4 0.453177098 0.006372190 0.043200581 0.999368288
MEAN          3.217123172 42.69388108 3.893316488 8605.147733

receive 'ERROR: COV matrix incomplete in data set'.

 

Is it possible that certain of the matrix items are so small that they are interpreted incorrectly, (e.g. 0.000155024)?

jdwaterman91
Obsidian | Level 7

Due to your spacing try setting the length of the Name Variable = to $9. instead of $10. 

 

PBG
Obsidian | Level 7 PBG
Obsidian | Level 7

I did do so..no luck unfortunately.

jdwaterman91
Obsidian | Level 7

This is the code I have based on what you've posted so far:

 

Data RandomValueGeneratorStatistics (type=COV) ; 
input _TYPE_ $ _NAME_ $9. variable1 variable2 variable3 variable4; 
datalines ; 
COV variable1 0.999368288 0.009398213 0.075083647 0.453177098
COV variable2 0.009398213 0.000155024 0.000878636 0.006372190
COV variable3 0.075083647 0.000878636 0.008894056 0.043200581
COV variable4 0.453177098 0.006372190 0.043200581 0.999368288
MEAN          3.217123172 42.69388108 3.893316488 8605.147733
;
run; 

Proc Simnorm data=RandomValueGeneratorStatistics outsim=ssim 
numreal = 1000 
seed = 54321 ; 
var variable1 variable2 variable3 variable4 ; 
run;

... And some of the results:

 

 Obs    variable1    variable2    variable3    variable4    Rnum

1     3.75650      42.7002      3.99283      8607.08       1
2     3.74764      42.7058      3.86268      8606.20       2
3     2.67129      42.6923      3.86041      8603.92       3
4     1.81036      42.6811      3.78579      8604.83       4
5     3.06134      42.6897      3.89000      8604.87       5
6     3.08770      42.6886      3.92196      8604.73       6
7     4.44447      42.6934      3.87695      8605.65       7
8     2.72725      42.6790      3.90519      8604.83       8
9     3.60691      42.6917      3.88077      8604.06       9
10     4.04193      42.7055      3.95174      8604.69      10
11     2.17955      42.6804      3.80562      8604.45      11
12     3.15123      42.6909      3.94730      8605.94      12

Is this what you are looking for?

PBG
Obsidian | Level 7 PBG
Obsidian | Level 7

Precisely the code I have! I'm not sure why it's not working for me the same. That's exactly the result I'm looking for!

jdwaterman91
Obsidian | Level 7

What error message is the log giving you?

PBG
Obsidian | Level 7 PBG
Obsidian | Level 7

Same as before: "ERROR: COV matrix incomplete in data set..."

 

My variable names are actually written with capital 'V'. Could that be a problem? I'll just copy-paste the code you kindly provided before and try..

PBG
Obsidian | Level 7 PBG
Obsidian | Level 7

Works like a charm!

 

Thank you very much for your help through this @jdwaterman91.

Ksharp
Super User
COV matrix should be symmetric . And why not use IML ?  

Data RandomValueGeneratorStatistics (type=COV) ; 
input _TYPE_ $ _NAME_ $ variable1 variable2 variable3 variable4; 
datalines ; 
COV variable1  0.739191357 0.276109171 0.100056621 470.1092606
COV variable2  0.876109171 0.327432304 0.648272489 0.611948925
COV variable3  0.900056621 0.848272489 0.163812314 0.558222994
COV variable4  470.1092606 0.611948925 0.358222994 5474224.269
MEAN    .       4.217123172 41.69388108 4.893316488 8606.147733
;
run; 
proc iml;
use RandomValueGeneratorStatistics;
read all var{variable1 variable2 variable3 variable4} where(_TYPE_='MEAN') into mean;
read all var{variable1 variable2 variable3 variable4} where(_TYPE_='COV') into cov;
close;

n=1000;
call randseed(123456789);
x=randnormal(n,mean,cov);

create want from x;
append from x;
close;
quit;

proc corr data=want cov;
run;   
PBG
Obsidian | Level 7 PBG
Obsidian | Level 7

Thanks for your help @Ksharp.

 

The matrix I'm actually using is in fact symmetric now. I don't have an IML license unfortunately. Any more ideas? Thanks for your help.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 2141 views
  • 14 likes
  • 3 in conversation