3

I have an odd conceptual question, It's not about a specific incident, just a general best-practices approach.

I have asked myself on occasion when defining java Equal methods, what makes two objects equal. The simple approach is to check rather every field is equal, however, there are occasions when this isn't always logical. Here are three examples I could think of when I'm not certain rather a field should be part of an equals method:

  1. transient fields and state data. Any transient field on a JPA object for example. or say I have a plugin class with a jar location and some other identification data to uniquely define where to find a plugin to load, and a transient isLoaded Boolean which tells me rather I got around to loading the plugin yet. Should two references to the same plugin but with different isLoaded variables be equal?

  2. Costly equality checks. Lets say I have a person that has a family members array. If I check everyone in the family member array I have to check everyone in their array etc and suddenly my equality method checks every person in memory, and has to check for cycles etc.

  3. Data that may not be loaded at time of equality check. Lets say I have a JPA object with consists of only two fields, a unique Name, and a UUID field generated by the database. The name uniquely identifies the row of the database, but the UUID only exists when I load from the database. How do I check if an Object I want to save to the database is equal to the ones I already loaded to the database, does the fact that one object doesn't have the UUID that is generated by the database make them unique etc?

I realize that at times it depends on how you plan to use the quality method. However, in situations where you are writing an object and aren't yet certain how it will be utilized how do you decide what to include in an equal method? Do you never overwrite one unless you can include every field in the check?

If you need your hash function to behave in a certain manner, which indirectly defines how your equality method should work, but someone could easily assume a different definition of equality does this mean you simply need to notate your equal method well, or is it a code smell that you have done something wrong?

dsollen
  • 1,143

2 Answers2

3

I think if there were a one-size-fits-all way to compare objects, then Java (or any language) would make it the default, no?

Equality - like hashing and comparison - should be fast, simple and most importantly referentially transparent.

In the end, the question of what exactly the equals method is for is important. If you don't know, then just leave it as is. Physical comparison makes sense in many use cases, particularly if you have control over when objects are created vs. when they are reused.

Also, rather than trying to design comparison functions meaningful for all cases, write code using things like Comparator and let calling code inject whatever makes the most sense at the call site.

back2dos
  • 30,060
3

First of all, there are two kinds of classes:

  • Classes with "value-type" semantics.
  • Classes with "reference" semantics.

You can google these terms to find information about precisely what they mean, and how they differ from each other, for your language of choice.

Every single class with "value-type" semantics must have an equals() method, and this method must take into consideration every single one of its fields, or at least give the illusion that it does so. If it is expensive, bite the bullet and spend clock cycles lavishly, because it is necessary.

Classes with "reference" semantics do not need an equals() method. I would not go as far as to say that they should not have such a method, but if they are to have one, it should be thought of as a utility / helper method which is going to be used in some weird way which is unrelated to the original intent of equals. That's because under normal circumstances, these objects are compared by reference, not by value.

Reference equality is checked using the == operator in Java, while in C#, where the == operator may be overloaded, we use System.Object.ReferenceEquals(). (Though, again, you are not supposed to overload the == operator of a reference-semantics class.)

Incidentally, the kinds of classes that are complex and expensive to check for equality usually tend to be classes with reference semantics, so they do not need an equals() method. And generally, if you have any doubts or second thoughts as to how the equals() method should be implemented, this is a good indication that what you have in your hands is a reference-semantics class, not a value-type-semantics class.

As for hashCode(), the only objects that should implement it are objects that not only have value semantics, but are also immutable. This is because hash containers generally obtain the hash code of an object once, and then cache it for as long as the object resides in the container, so if the object undergoes a mutation, the contents of the object will be in conflict with the cached hash code. It is a very common newbie bug to use a non-immutable object as a key to a hashmap, and a very hard one to track down unless you know where to look first: the immutability of the class used as the key.

So, every class which is not immutable should have a hashCode() method coded as follows:

public int hashCode()
{
    assert false; //OMG! hashCode() was invoked on mutable object!
}

So, since only immutable classes should implement hashCode(), ensuring that it works in a way which is in agreement with how equals() works is pretty straightforward, and probably a lot more simple than what you may have feared.

Mike Nakis
  • 32,193
  • 7
  • 77
  • 112