proc compare output

Reply
Super User
Posts: 1,115

proc compare output

i came across this issue where when i compare the two datasets by proc compare , i see that for a particular numeric variable,

though the values are same, the proc compare is showing the differences in the lst output.

like i have dataset one and dataset two. in dataset one the value of the variable is 63.3 and in another dataset two the value

of variable is 63.3 and after proc compare i see the below output

var1       var1       diff.           %diff

63.3000 63.3000 7.105E-15  1.123E-14

the above problem is solved if used the criterion=0.0001 in proc compare.

however i would like to know why the proc compare shows additional zeros,though they are not there in both the datasets.

why zeros are added to the decimal places. could you please provide your suggestions. 

Thanks,

Jag

Thanks,
Jag
Esteemed Advisor
Esteemed Advisor
Posts: 7,217

Re: proc compare output

This is quite common.  The value has been calculated somewhere and has a really tiny fraction associated.  The compare output limits to 8 characters, so to expand you may have:

var1                    var2

63.300000001     63.300000002

In your dataset view the full value is hidden, but in a compare you get a fractional difference.  The simplest way is to ignore these by setting the FUZZ option on the proc compare, however my recommendation is that you explicitly round the values, after the calculation, e.g. round(var1,8.1);  That way the number will be 63.3.

Super User
Posts: 1,115

Re: proc compare output

Thank you for your response.

However i also observed that for a character variable a similar issue occurred where one dataset had a value of 54 and in another dataset the value was also 54. Both the dataset variables had same format $4. However on proc compare the output showed the below results

var1    var1

54.0   54

Here as you can see the var1 from one dataset is getting the additional zero added at the proc compare level. However when we open the dataset the value appears as 54.

why the fraction is added to the character variable of one dataset and not for the other.

Any reason for this and how to overcome this issue.

Appreciate your response.

Thanks,
Jag
Esteemed Advisor
Esteemed Advisor
Posts: 7,217

Re: proc compare output

At a guess var1 isn't character.  Maybe post a some test data where this is happening as it would be quite strange.

Super User
Posts: 1,115

Re: proc compare output

yes i agree it is strange to me as well.

However i checked the datasets and the formats of the variables in both the datasets is $4. and it is character.

i work on client server so it is not possible to mimic the exact thing that is happening. 

Thanks,
Jag
Esteemed Advisor
Posts: 6,661

Re: proc compare output

If both vars are string variables (as indicated by the $4. format), then clearly "54.0" is different from "54". Anything else would be a SEVERE bug.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 1,115

Re: proc compare output

As suggested by Reeza , i  checked the proc contents for both the datasets and the variables and i see that the type is char, length as 4 and format missing for both the variables.

will this help.

will usage of compress for one of the variables have an affect in this case. Since i believe during the derivation of the variable, may be the programmer has used compress function on one side. Due to this one variable is not taking the fractional places in proc compare and the other variable is taking the fractional places in the proc compare.

could this be a possible reason or could there be any other reason for this strange behavior.

Thanks,
Jag
Esteemed Advisor
Esteemed Advisor
Posts: 7,217

Re: proc compare output

So is there, in your text variable, the text string "54.0", do a filter on both datasets and see distinct values.  If there isn't then I would suggest tech support as I can't envisage proc compare deciding to convert a char to number plus decimal, then convert back to char.  If the values are in the data - which I suspect they are, then you would need to go back through the program which created that variable.  I would suspect somewhere in that program the programmer has done a put(value,8.1) or something similar and the other programmer has done put(value,best.).

Grand Advisor
Posts: 17,360

Re: proc compare output

If that's truly happening with a character variable, then consider either:

posting the proc contents with the proc compare output.

ie

proc contents data=have1;run;

proc contents data=have2; run;

proc compare data=have1 compare=have2; run;

OR contacting Tech Support.

Super User
Posts: 1,115

Re: proc compare output

thank you Reeza

i will try and let you know.

Thanks,
Jag
Esteemed Advisor
Posts: 6,661

Re: proc compare output

To extend on RW9's answer, this are the artifacts from the 8-byte real data format that SAS uses to store numbers. Depending on the calculation(s) done to arrive at a certain value, you will get binary rounding differences where the mantissa ends.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Ask a Question
Discussion stats
  • 10 replies
  • 666 views
  • 0 likes
  • 4 in conversation