08-07-2014 12:42 AM
i came across this issue where when i compare the two datasets by proc compare , i see that for a particular numeric variable,
though the values are same, the proc compare is showing the differences in the lst output.
like i have dataset one and dataset two. in dataset one the value of the variable is 63.3 and in another dataset two the value
of variable is 63.3 and after proc compare i see the below output
var1 var1 diff. %diff
63.3000 63.3000 7.105E-15 1.123E-14
the above problem is solved if used the criterion=0.0001 in proc compare.
however i would like to know why the proc compare shows additional zeros,though they are not there in both the datasets.
why zeros are added to the decimal places. could you please provide your suggestions.
08-07-2014 04:59 AM
This is quite common. The value has been calculated somewhere and has a really tiny fraction associated. The compare output limits to 8 characters, so to expand you may have:
In your dataset view the full value is hidden, but in a compare you get a fractional difference. The simplest way is to ignore these by setting the FUZZ option on the proc compare, however my recommendation is that you explicitly round the values, after the calculation, e.g. round(var1,8.1); That way the number will be 63.3.
08-07-2014 06:44 AM
Thank you for your response.
However i also observed that for a character variable a similar issue occurred where one dataset had a value of 54 and in another dataset the value was also 54. Both the dataset variables had same format $4. However on proc compare the output showed the below results
Here as you can see the var1 from one dataset is getting the additional zero added at the proc compare level. However when we open the dataset the value appears as 54.
why the fraction is added to the character variable of one dataset and not for the other.
Any reason for this and how to overcome this issue.
Appreciate your response.
08-07-2014 09:55 AM
yes i agree it is strange to me as well.
However i checked the datasets and the formats of the variables in both the datasets is $4. and it is character.
i work on client server so it is not possible to mimic the exact thing that is happening.
08-08-2014 02:28 AM
If both vars are string variables (as indicated by the $4. format), then clearly "54.0" is different from "54". Anything else would be a SEVERE bug.
08-08-2014 08:01 AM
As suggested by Reeza , i checked the proc contents for both the datasets and the variables and i see that the type is char, length as 4 and format missing for both the variables.
will this help.
will usage of compress for one of the variables have an affect in this case. Since i believe during the derivation of the variable, may be the programmer has used compress function on one side. Due to this one variable is not taking the fractional places in proc compare and the other variable is taking the fractional places in the proc compare.
could this be a possible reason or could there be any other reason for this strange behavior.
08-08-2014 08:18 AM
So is there, in your text variable, the text string "54.0", do a filter on both datasets and see distinct values. If there isn't then I would suggest tech support as I can't envisage proc compare deciding to convert a char to number plus decimal, then convert back to char. If the values are in the data - which I suspect they are, then you would need to go back through the program which created that variable. I would suspect somewhere in that program the programmer has done a put(value,8.1) or something similar and the other programmer has done put(value,best.).
08-07-2014 11:43 AM
If that's truly happening with a character variable, then consider either:
posting the proc contents with the proc compare output.
proc contents data=have1;run;
proc contents data=have2; run;
proc compare data=have1 compare=have2; run;
OR contacting Tech Support.
08-07-2014 06:36 AM
To extend on RW9's answer, this are the artifacts from the 8-byte real data format that SAS uses to store numbers. Depending on the calculation(s) done to arrive at a certain value, you will get binary rounding differences where the mantissa ends.