BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
matilda
Fluorite | Level 6

I am running PROC MI for the first time to impute missing variables in a clinical data set. I am using fully conditional specification with the regpmm (predictive mean matching) option for some variables, because they should only take discrete values between 0 and 28 (variable in question is a count of tender joints, "tjc"). Despite this, some observations of tjc get the result 1.110223E-16. Looking at the distribution of the imputed data these values should probably have been zero. Does anyone know what might have caused this, and if there is a risk that the rest of my data is distorted, too? Can I ignore it or is there something really wrong? I have checked the original data, and all values are discrete and between 0 and 28.

 

The data set has around 2000 subjects and 30 variables with varying degrees of missing data.

 

Here is some example code:

proc mi data=have out=mi.want_&sysdate. nimpute=25 seed=123;

	var age female substance duration crp esr pain;
	class substance;

	fcs plots=trace nbiter=25 regpmm (tjc = age female substance duration crp esr pain) ;

run;

In reality the model is much bigger though, with a macro looping through all variables as predictors in the model.

 

I am running SAS Enterprise Guide version 7 (64-bit) on Windows.

Thankful for any input!

1 ACCEPTED SOLUTION

Accepted Solutions
matilda
Fluorite | Level 6

I contacted SAS Technical Support as you suggested, and they say that the small values I get are "the results of the formula [for predictive mean matching] when K=5 (default)" and given my data. To get only integers I could either use the round function or try to alter K in the formula. Good to know nothing was wrong! I decided to round "manually" in a data step after the PROC MI call.

View solution in original post

10 REPLIES 10
Rick_SAS
SAS Super FREQ

I am unable to reproduce your issue when I use the following program. All values of the impute PetalLength variable are integers, including those that are zero. See if you can create an example by using the Sashelp.Iris data (or some other public data set.)

 

data Iris;
call streaminit(1);
set Sashelp.Iris;
/* translate PetalLength to introduce zeros */
PetalLength = PetalLength - 12;  
if PetalLength < 0 then PetalLength = 0;
/* insert MAR values */
array x[4] _NUMERIC_;
do i = 1 to dim(x);
   if rand("bern", 0.2) then x[i] = .;
end;
drop i;
run;

proc mi data=Iris out=Want nimpute=25 seed=123;
	var SepalLength SepalWidth PetalWidth PetalLength Species;
   class Species;
	fcs nbiter=25 regpmm (PetalLength = SepalLength SepalWidth PetalWidth) ;
run;

/* are any values of PetalLength not integers? */
data Fractional;
set Want;
fracPL = PetalLength - floor(PetalLength);
run;

/* find min, mean, max of difference */
proc means data=fractional;
var fracPL;
run;
matilda
Fluorite | Level 6

@Rick_SAS 

I wasn't able to reproduce the issue using the Iris data set unfortunately. I tried to make a smaller data set of my own data, but if I removed only one subject the tjc variable stopped getting these strange values. However, I was able to reduce my model to just two variables (age - which have no missing values - and pain - on a VAS scale from 0-100) and 100 subjects, and I get the same error for pain. No one gets the value 0 in the imputation, but instead the value 3.552714E-15 appears for two subjects. Can it be some kind of rounding error? If I include even fewer subjects the strange values are no longer there. 

 

Now using the code:

proc mi data=have out=want nimpute=1 seed=123;
   var age pain;
   fcs nbiter=5 regpmm (age pain);
run;

 

Rick_SAS
SAS Super FREQ
If you cannot reproduce it with Iris, the problem is probably specific to your data. Make sure that your response variable has a REAL zero when it is supposed to:
if tjc < 1 then tjc=0;

If that doesn't fix the problem, you have two options:
1. Post your data here
2. Open a track with SAS Technical Support and send them the data. Be sure to reference this thread so TS can see what has already been tried.
matilda
Fluorite | Level 6

@Rick_SAS 

Here is an example from my data. Subjects 38, 47, 76 and 95 get a pain value of 3.552714E-15 when I run it.

data patients;
length id pain age 8.;
input id pain age;
datalines;
1 . 68
2 . 57
3 7 68
4 52 51
5 . 36
6 . 71
7 18 26
8 . 47
9 15 23
10 64 23
11 . 52
12 0 33
13 22 42
14 . 47
15 . 78
16 . 72
17 6 55
18 . 60
19 78 81
20 50 86
21 10 36
22 20 48
23 4 43
24 67 61
25 . 48
26 70 40
27 55 50
28 90 47
29 30 85
30 63 78
31 . 61
32 11 45
33 . 51
34 30 65
35 5 31
36 62 27
37 30 67
38 . 22
39 . 62
40 . 60
41 . 52
42 . 46
43 30 50
44 . 87
45 5 31
46 5 37
47 . 33
48 80 67
49 . 55
50 65 33
51 . 31
52 . 56
53 59 62
54 37 72
55 30 61
56 23 66
57 1 53
58 75 33
59 . 56
60 . 32
61 15 50
62 9 60
63 . 63
64 40 63
65 0 28
66 . 60
67 18 87
68 . 57
69 72 67
70 . 37
71 . 31
72 11 30
73 26 32
74 0 42
75 . 51
76 . 23
77 . 31
78 . 58
79 . 53
80 . 31
81 18 55
82 12 81
83 80 21
84 58 52
85 . 76
86 30 50
87 3 32
88 21 85
89 . 55
90 14 63
91 . 70
92 16 53
93 0 26
94 19 23
95 . 25
96 16 66
97 57 77
98 3 70
99 1 43
100 . 55
;
run;

proc mi data=patients  out=want nimpute=1 seed=123;
	var age  pain;
	fcs nbiter=5 regpmm (age  pain);
run;
Rick_SAS
SAS Super FREQ

I am able to reproduce your results. I suggest you send this example to SAS Technical Support.

 

As a workaround, you can put the following DATA step after the PROC MI call:

 

data want;
set want;
if pain<1 then pain=0;
run;
Rick_SAS
SAS Super FREQ

Also, the PROC MI statement has a ROUND= option, which might be useful here. You can use ROUND=0.1 to round all imputed variables to the nearest tenth (one decimal place). You can also specify a list of numbers and use a missing value (.) to indicate that certain variables should not be rounded. For example, if you are imputing Age and Pain, you can use 

ROUND=  .  0.1    /* that's   MISSING and 0.1 */

if you do not want to round Age but you want to round Pain to the nearest 0.1.

matilda
Fluorite | Level 6

I contacted SAS Technical Support as you suggested, and they say that the small values I get are "the results of the formula [for predictive mean matching] when K=5 (default)" and given my data. To get only integers I could either use the round function or try to alter K in the formula. Good to know nothing was wrong! I decided to round "manually" in a data step after the PROC MI call.

ballardw
Super User

I might question exactly how you verified that all the values were discrete integers. Assigned formats may disguise extremely small values.

Consider:

data example;
  x=1.110223E-16;
  put x 16.14 ;
run;

Which shows a formatted value of

0.00000000000000

Which could easily be mistaken for 0.

 

 

 

matilda
Fluorite | Level 6

I had just run a PROC FREQ to verify that all values were discrete integers, but perhaps that is not sufficient? I also tried what Rick_SAS did for his petal example in another answer [fracPL = PetalLength - floor(PetalLength) and then proc means], and I only get zeros. How would you suggest me to verify that I only have discrete integers in my original data?

ballardw
Super User

@matilda wrote:

I had just run a PROC FREQ to verify that all values were discrete integers, but perhaps that is not sufficient? I also tried what Rick_SAS did for his petal example in another answer [fracPL = PetalLength - floor(PetalLength) and then proc means], and I only get zeros. How would you suggest me to verify that I only have discrete integers in my original data?


One way is to see if the INTZ (or INT) function result equals the value

data example;
   input x;
   if intz(x) ne X then put "X is not integer";
   else put "X is intger";
datalines;
0
1.110223E-16
123456.8
;

INTZ and INT may return different results.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 1161 views
  • 3 likes
  • 3 in conversation