I am running PROC MI for the first time to impute missing variables in a clinical data set. I am using fully conditional specification with the regpmm (predictive mean matching) option for some variables, because they should only take discrete values between 0 and 28 (variable in question is a count of tender joints, "tjc"). Despite this, some observations of tjc get the result 1.110223E-16. Looking at the distribution of the imputed data these values should probably have been zero. Does anyone know what might have caused this, and if there is a risk that the rest of my data is distorted, too? Can I ignore it or is there something really wrong? I have checked the original data, and all values are discrete and between 0 and 28.
The data set has around 2000 subjects and 30 variables with varying degrees of missing data.
Here is some example code:
proc mi data=have out=mi.want_&sysdate. nimpute=25 seed=123;
var age female substance duration crp esr pain;
class substance;
fcs plots=trace nbiter=25 regpmm (tjc = age female substance duration crp esr pain) ;
run;
In reality the model is much bigger though, with a macro looping through all variables as predictors in the model.
I am running SAS Enterprise Guide version 7 (64-bit) on Windows.
Thankful for any input!
I contacted SAS Technical Support as you suggested, and they say that the small values I get are "the results of the formula [for predictive mean matching] when K=5 (default)" and given my data. To get only integers I could either use the round function or try to alter K in the formula. Good to know nothing was wrong! I decided to round "manually" in a data step after the PROC MI call.
I am unable to reproduce your issue when I use the following program. All values of the impute PetalLength variable are integers, including those that are zero. See if you can create an example by using the Sashelp.Iris data (or some other public data set.)
data Iris;
call streaminit(1);
set Sashelp.Iris;
/* translate PetalLength to introduce zeros */
PetalLength = PetalLength - 12;
if PetalLength < 0 then PetalLength = 0;
/* insert MAR values */
array x[4] _NUMERIC_;
do i = 1 to dim(x);
if rand("bern", 0.2) then x[i] = .;
end;
drop i;
run;
proc mi data=Iris out=Want nimpute=25 seed=123;
var SepalLength SepalWidth PetalWidth PetalLength Species;
class Species;
fcs nbiter=25 regpmm (PetalLength = SepalLength SepalWidth PetalWidth) ;
run;
/* are any values of PetalLength not integers? */
data Fractional;
set Want;
fracPL = PetalLength - floor(PetalLength);
run;
/* find min, mean, max of difference */
proc means data=fractional;
var fracPL;
run;
I wasn't able to reproduce the issue using the Iris data set unfortunately. I tried to make a smaller data set of my own data, but if I removed only one subject the tjc variable stopped getting these strange values. However, I was able to reduce my model to just two variables (age - which have no missing values - and pain - on a VAS scale from 0-100) and 100 subjects, and I get the same error for pain. No one gets the value 0 in the imputation, but instead the value 3.552714E-15 appears for two subjects. Can it be some kind of rounding error? If I include even fewer subjects the strange values are no longer there.
Now using the code:
proc mi data=have out=want nimpute=1 seed=123;
var age pain;
fcs nbiter=5 regpmm (age pain);
run;
Here is an example from my data. Subjects 38, 47, 76 and 95 get a pain value of 3.552714E-15 when I run it.
data patients;
length id pain age 8.;
input id pain age;
datalines;
1 . 68
2 . 57
3 7 68
4 52 51
5 . 36
6 . 71
7 18 26
8 . 47
9 15 23
10 64 23
11 . 52
12 0 33
13 22 42
14 . 47
15 . 78
16 . 72
17 6 55
18 . 60
19 78 81
20 50 86
21 10 36
22 20 48
23 4 43
24 67 61
25 . 48
26 70 40
27 55 50
28 90 47
29 30 85
30 63 78
31 . 61
32 11 45
33 . 51
34 30 65
35 5 31
36 62 27
37 30 67
38 . 22
39 . 62
40 . 60
41 . 52
42 . 46
43 30 50
44 . 87
45 5 31
46 5 37
47 . 33
48 80 67
49 . 55
50 65 33
51 . 31
52 . 56
53 59 62
54 37 72
55 30 61
56 23 66
57 1 53
58 75 33
59 . 56
60 . 32
61 15 50
62 9 60
63 . 63
64 40 63
65 0 28
66 . 60
67 18 87
68 . 57
69 72 67
70 . 37
71 . 31
72 11 30
73 26 32
74 0 42
75 . 51
76 . 23
77 . 31
78 . 58
79 . 53
80 . 31
81 18 55
82 12 81
83 80 21
84 58 52
85 . 76
86 30 50
87 3 32
88 21 85
89 . 55
90 14 63
91 . 70
92 16 53
93 0 26
94 19 23
95 . 25
96 16 66
97 57 77
98 3 70
99 1 43
100 . 55
;
run;
proc mi data=patients out=want nimpute=1 seed=123;
var age pain;
fcs nbiter=5 regpmm (age pain);
run;
I am able to reproduce your results. I suggest you send this example to SAS Technical Support.
As a workaround, you can put the following DATA step after the PROC MI call:
data want;
set want;
if pain<1 then pain=0;
run;
Also, the PROC MI statement has a ROUND= option, which might be useful here. You can use ROUND=0.1 to round all imputed variables to the nearest tenth (one decimal place). You can also specify a list of numbers and use a missing value (.) to indicate that certain variables should not be rounded. For example, if you are imputing Age and Pain, you can use
ROUND= . 0.1 /* that's MISSING and 0.1 */
if you do not want to round Age but you want to round Pain to the nearest 0.1.
I contacted SAS Technical Support as you suggested, and they say that the small values I get are "the results of the formula [for predictive mean matching] when K=5 (default)" and given my data. To get only integers I could either use the round function or try to alter K in the formula. Good to know nothing was wrong! I decided to round "manually" in a data step after the PROC MI call.
I might question exactly how you verified that all the values were discrete integers. Assigned formats may disguise extremely small values.
Consider:
data example; x=1.110223E-16; put x 16.14 ; run;
Which shows a formatted value of
0.00000000000000
Which could easily be mistaken for 0.
I had just run a PROC FREQ to verify that all values were discrete integers, but perhaps that is not sufficient? I also tried what Rick_SAS did for his petal example in another answer [fracPL = PetalLength - floor(PetalLength) and then proc means], and I only get zeros. How would you suggest me to verify that I only have discrete integers in my original data?
@matilda wrote:
I had just run a PROC FREQ to verify that all values were discrete integers, but perhaps that is not sufficient? I also tried what Rick_SAS did for his petal example in another answer [fracPL = PetalLength - floor(PetalLength) and then proc means], and I only get zeros. How would you suggest me to verify that I only have discrete integers in my original data?
One way is to see if the INTZ (or INT) function result equals the value
data example; input x; if intz(x) ne X then put "X is not integer"; else put "X is intger"; datalines; 0 1.110223E-16 123456.8 ;
INTZ and INT may return different results.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.