ChatGPT解决这个技术问题 Extra ChatGPT

Best practices for overriding isEqual: and hash

How do you properly override isEqual: in Objective-C? The "catch" seems to be that if two objects are equal (as determined by the isEqual: method), they must have the same hash value.

The Introspection section of the Cocoa Fundamentals Guide does have an example on how to override isEqual:, copied as follows, for a class named MyWidget:

- (BOOL)isEqual:(id)other {
    if (other == self)
        return YES;
    if (!other || ![other isKindOfClass:[self class]])
        return NO;
    return [self isEqualToWidget:other];
}

- (BOOL)isEqualToWidget:(MyWidget *)aWidget {
    if (self == aWidget)
        return YES;
    if (![(id)[self name] isEqual:[aWidget name]])
        return NO;
    if (![[self data] isEqualToData:[aWidget data]])
        return NO;
    return YES;
}

It checks pointer equality, then class equality, and finally compares the objects using isEqualToWidget:, which only checks the name and data properties. What the example doesn't show is how to override hash.

Let's assume there are other properties that do not affect equality, say age. Shouldn't the hash method be overridden such that only name and data affect the hash? And if so, how would you do that? Just add the hashes of name and data? For example:

- (NSUInteger)hash {
    NSUInteger hash = 0;
    hash += [[self name] hash];
    hash += [[self data] hash];
    return hash;
}

Is that sufficient? Is there a better technique? What if you have primitives, like int? Convert them to NSNumber to get their hash? Or structs like NSRect?

(Brain fart: Originally wrote "bitwise OR" them together with |=. Meant add.)

if (![other isKindOfClass:[self class]]) - This technically means equality will not be Commutative. I.e. A = B does not mean B = A (e.g. if one is a subclass of the other)
Documentation link is dead, now archived to Introspection

f
fishinear

Start with

 NSUInteger prime = 31;
 NSUInteger result = 1;

Then for every primitive you do

 result = prime * result + var

For objects you use 0 for nil and otherwise their hashcode.

 result = prime * result + [var hash];

For booleans you use two different values

 result = prime * result + ((var)?1231:1237);

Explanation and Attribution

This is not tcurdt's work, and comments were asking for more explanation, so I believe an edit for attribution is fair.

This algorithm was popularized in the book "Effective Java", and the relevant chapter can currently be found online here. That book popularized the algorithm, which is now a default in a number of Java applications (including Eclipse). It derived, however, from an even older implementation which is variously attributed to Dan Bernstein or Chris Torek. That older algorithm originally floated around on Usenet, and certain attribution is difficult. For example, there is some interesting commentary in this Apache code (search for their names) that references the original source.

Bottom line is, this is a very old, simple hashing algorithm. It is not the most performant, and it is not even proven mathematically to be a "good" algorithm. But it is simple, and a lot of people have used it for a long time with good results, so it has a lot of historical support.


Where did the 1231:1237 come from? I see it in Java's Boolean.hashCode() too. Is it magical?
It's the nature of a hashing algorithms that there will be collisions. So I don't see your point, Paul.
In my opinion this answer doesn't respond to the actual question (best practices for overriding NSObject's hash). It just provides one particular hash algorithm. On top of that, the sparsity of the explanation makes it difficult to understand without deep knowledge on the matter, and can result in people using it without knowing what are they doing. I don't understand why this question has so many upvotes.
1st problem - (int) is small and easy to overflow, use NSUInteger. 2nd problem - If you keep multiplying the result by each variable hash your result will overflow. eg. [NSString hash] creates large values. If you have 5+ variables it's easy to overflow with this algorithm. It'll result in everything mapping to the same hash, which is bad. See my response: stackoverflow.com/a/4393493/276626
@PaulSolt - Overflow is not a problem in generating a hash, collision is. But overflow does not necessarily make collision any more likely, and your statement about overflow causing everything to map to the same hash is simply incorrect.
B
Brian B.

I'm just picking up Objective-C myself, so I can't speak for that language specifically, but in the other languages I use if two instances are "Equal" they must return the same hash - otherwise you are going to have all sorts of problems when trying to use them as keys in a hashtable (or any dictionary-type collections).

On the other hand, if 2 instances are not equal, they may or may not have the same hash - it is best if they don't. This is the difference between an O(1) search on a hash table and an O(N) search - if all your hashes collide you may find that searching your table is no better than searching a list.

In terms of best practices your hash should return a random distribution of values for its input. This means that, for example, if you have a double, but the majority of your values tend to cluster between 0 and 100, you need to make sure that the hashes returned by those values are evenly distributed across the entire range of possible hash values. This will significantly improve your performance.

There are a number of hashing algorithms out there, including several listed here. I try to avoid creating new hash algorithms as it can have large performance implications, so using the existing hash methods and doing a bitwise combination of some sort as you do in your example is a good way to avoid it.


+1 Excellent answer, deserves more upvotes, especially since he actually talks about "best practices" and the theory behind why a good (unique) hash is important.
Y
Yariv Nissim

A simple XOR over the hash values of critical properties is sufficient 99% of the time.

For example:

- (NSUInteger)hash
{
    return [self.name hash] ^ [self.data hash];
}

Solution found at http://nshipster.com/equality/ by Mattt Thompson (which also referred to this question in his post :~)


The problem with this answer is that it doesn't take upon consideration primitive values at all. And primitive values may also important for hashing.
@Vive Most of these problems are solved in Swift, but these types usually represent their own hash since they're primitive.
While you're right for Swift, still there are many projects written with objc. Because your answer is dedicated for objc it's worth at least a mention.
XORing hash values together is bad advice, it leads to many hash collisions. Instead, multiply with a prime number and then add, as other answers state.
If your hash code "is sufficient 99% of the time" as you write, then you're also saying that it's causing collisions 1% of the time, which would be disastrous. In other words, that quote isn't doing you a favor.
L
LavaSlider

I found this thread extremely helpful supplying everything I needed to get my isEqual: and hash methods implemented with one catch. When testing object instance variables in isEqual: the example code uses:

if (![(id)[self name] isEqual:[aWidget name]])
    return NO;

This repeatedly failed (i.e., returned NO) without and error, when I knew the objects were identical in my unit testing. The reason was, one of the NSString instance variables was nil so the above statement was:

if (![nil isEqual: nil])
    return NO;

and since nil will respond to any method, this is perfectly legal but

[nil isEqual: nil]

returns nil, which is NO, so when both the object and the one being tested had a nil object they would be considered not equal (i.e., isEqual: would return NO).

This simple fix was to change the if statement to:

if ([self name] != [aWidget name] && ![(id)[self name] isEqual:[aWidget name]])
    return NO;

This way, if their addresses are the same it skips the method call no matter if they are both nil or both pointing to the same object but if either is not nil or they point to different objects then the comparator is appropriately called.

I hope this saves someone a few minutes of head scratching.


P
Paul Solt

The hash function should create a semi-unique value that is not likely to collide or match another object's hash value.

Here is the full hash function, which can be adapted to your classes instance variables. It uses NSUInteger's rather than int for compatibility on 64/32bit applications.

If the result becomes 0 for different objects, you run the risk of colliding hashes. Colliding hashes can result in unexpected program behavior when working with some of the collection classes that depend on the hash function. Make sure to test your hash function prior to use.

-(NSUInteger)hash {
    NSUInteger result = 1;
    NSUInteger prime = 31;
    NSUInteger yesPrime = 1231;
    NSUInteger noPrime = 1237;
    
    // Add any object that already has a hash function (NSString)
    result = prime * result + [self.myObject hash];
    
    // Add primitive variables (int)
    result = prime * result + self.primitiveVariable; 

    // Boolean values (BOOL)
    result = prime * result + (self.isSelected ? yesPrime : noPrime);
    
    return result;
}

One gotcha here: I prefer to avoid dot syntax, so I transformed your BOOL statement into (eg) result = prime * result + [self isSelected] ? yesPrime : noPrime;. I then found this was setting result to (eg) 1231, I assume due to the ? operator taking precedence. I fixed the issue by adding brackets: result = prime * result + ([self isSelected] ? yesPrime : noPrime);
J
Jens Ayton

The easy but inefficient way is to return the same -hash value for every instance. Otherwise, yes, you must implement hash based only on objects which affect equality. This is tricky if you use lax comparisons in -isEqual: (e.g. case-insensitive string comparisons). For ints, you can generally use the int itself, unless you’ll be comparing with NSNumbers.

Don’t use |=, though, it will saturate. Use ^= instead.

Random fun fact: [[NSNumber numberWithInt:0] isEqual:[NSNumber numberWithBool:NO]], but [[NSNumber numberWithInt:0] hash] != [[NSNumber numberWithBool:NO] hash]. (rdar://4538282, open since 05-May-2006)


You're exactly right on the |=. Didn't really mean that. :) += and ^= are fairly equivalent. How do you handle non-integer primitives like double and float?
Random fun fact: Test it on Snow Leopard... ;-)
He's correct about using XOR instead of OR for combining fields into a hash. However, don't use the advice of returning the same -hash value for every object — although it easy, it can severely degrade the performance of anything that uses the object's hash. The hash does not have to be distinct for objects that aren't equal, but if you can achieve that, there's nothing like it.
The open radar bug report is closed. openradar.me/4538282 What does that mean?
JJD, the bug was fixed in Mac OS X 10.6, as Quinn hinted. (Note that the comment is two years old.)
u
user4951

Remember that you only need to provide hash that's equal when isEqual is true. When isEqual is false, the hash doesn't have to be unequal though presumably it is. Hence:

Keep hash simple. Pick a member (or few members) variable that is the most distinctive.

For example, for CLPlacemark, the name only is enough. Yes there are 2 or 3 distincts CLPlacemark with the exact same name but those are rare. Use that hash.

@interface CLPlacemark (equal)
- (BOOL)isEqual:(CLPlacemark*)other;
@end

@implementation CLPlacemark (equal)

...

-(NSUInteger) hash
{
    return self.name.hash;
}


@end

Notice I do not bother specifying the city, the country, etc. The name is enough. Perhaps the name and CLLocation.

Hash should be evenly distributed. So you can combine several members variable using the caret ^ (xor sign)

So it's something like

hash = self.member1.hash ^ self.member2.hash ^ self.member3.hash

That way the hash will be evenly distributed.

Hash must be O(1), and not O(n)

So what to do in array?

Again, simple. You do not have to hash all members of the array. Enough to hash the first element, last element, the count, maybe some middle elements, and that's it.


XORing hash values does not give an even distribution.
j
jedwidz

The equals and hash contracts are well specified and thoroughly researched in the Java world (see @mipardi's answer), but all the same considerations should apply to Objective-C.

Eclipse does a reliable job of generating these methods in Java, so here's an Eclipse example ported by hand to Objective-C:

- (BOOL)isEqual:(id)object {
    if (self == object)
        return true;
    if ([self class] != [object class])
        return false;
    MyWidget *other = (MyWidget *)object;
    if (_name == nil) {
        if (other->_name != nil)
            return false;
    }
    else if (![_name isEqual:other->_name])
        return false;
    if (_data == nil) {
        if (other->_data != nil)
            return false;
    }
    else if (![_data isEqual:other->_data])
        return false;
    return true;
}

- (NSUInteger)hash {
    const NSUInteger prime = 31;
    NSUInteger result = 1;
    result = prime * result + [_name hash];
    result = prime * result + [_data hash];
    return result;
}

And for a subclass YourWidget which adds a property serialNo:

- (BOOL)isEqual:(id)object {
    if (self == object)
        return true;
    if (![super isEqual:object])
        return false;
    if ([self class] != [object class])
        return false;
    YourWidget *other = (YourWidget *)object;
    if (_serialNo == nil) {
        if (other->_serialNo != nil)
            return false;
    }
    else if (![_serialNo isEqual:other->_serialNo])
        return false;
    return true;
}

- (NSUInteger)hash {
    const NSUInteger prime = 31;
    NSUInteger result = [super hash];
    result = prime * result + [_serialNo hash];
    return result;
}

This implementation avoids some subclassing pitfalls in the sample isEqual: from Apple:

Apple's class test other isKindOfClass:[self class] is asymmetric for two different subclasses of MyWidget. Equality needs to be symmetric: a=b if and only if b=a. This could easily be fixed by changing the test to other isKindOfClass:[MyWidget class], then all MyWidget subclasses would be mutually comparable.

Using an isKindOfClass: subclass test prevents subclasses from overriding isEqual: with a refined equality test. This is because equality needs to be transitive: if a=b and a=c then b=c. If a MyWidget instance compares equal to two YourWidget instances, then those YourWidget instances must compare equal to each other, even if their serialNo differs.

The second issue can be fixed by only considering objects to be equal if they belong to the exact same class, hence the [self class] != [object class] test here. For typical application classes, this seems to be the best approach.

However, there certainly are cases where the isKindOfClass: test is preferable. This is more typical of framework classes than application classes. For example, any NSString should compare equal to any other other NSString with the same underlying character sequence, regardless of the NSString/NSMutableString distinction, and also regardless of what private classes in the NSString class cluster are involved.

In such cases, isEqual: should have well-defined, well-documented behaviour, and it should be made clear that subclasses can't override this. In Java, the 'no override' restriction can be enforced by flagging the equals and hashcode methods as final, but Objective-C has no equivalent.


It seems that isMemberOfClass would be more correct than [self class] != [object class] on account of NSProxy: stackoverflow.com/questions/3126476/…
J
Jonathan Ellis

Hold on, surely a far easier way to do this is to first override - (NSString )description and provide a string representation of your object state (you must represent the entire state of your object in this string).

Then, just provide the following implementation of hash:

- (NSUInteger)hash {
    return [[self description] hash];
}

This is based on the principle that "if two string objects are equal (as determined by the isEqualToString: method), they must have the same hash value."

Source: NSString Class Reference


This assumes that the description method will be unique. Using description's hash creates a dependency, that might not be obvious, and a higher risk of collisions.
+1 Upvoted. This is an awesome idea. If you fear that descriptions cause collisions, then you can override it.
Thanks Jim, I won't deny that this is a bit of a hack, but it would work in any case I can think of -- and as I said, providing that you override description, I don't see why this is inferior to any of the higher voted solutions. May not be the most mathematically elegant solution, but should do the trick. As Brian B. states (most upvoted answer at this point): "I try to avoid creating new hash algorithms" -- agreed! -- I just hash the NSString!
Upvoted because it's a neat idea. I'm not going to use it though because I fear the additional NSString allocations.
This is not a generic solution since for most of the classes description includes the pointer address. So this make two different instances of the same class that are equal with different hash, which violate the basic assumption that two equal objects have same hash!
N
NANNAV

This doesn't directly answer your question (at all) but I've used MurmurHash before to generate hashes: murmurhash

Guess I should explain why: murmurhash is bloody fast...


A C++ library that is focused on unique hashes for a void* key using random number (and also doesn't relate to Objective-C objects) really isn't a helpful suggestion here. The -hash method should return a consistent value each time, or it will be utterly useless. If the object is added to a collection that calls -hash, and returns a new value every time, duplicates will never be detected, and you can never retrieve the object from the collection either. In this case, the term "hash" is different from the meaning in security/cryptography.
murmurhash is isn't a cryptographic hash function. Please check your facts before posting incorrect information. Murmurhash could be useful for hashing custom objective-c classes (esp. if you have a lot of NSDatas involved) because it is extremely fast. However I do grant you that maybe suggesting it isn't the best advice to give to someone "just picking up objective-c", but please note my prefix on my original reply to the question.
S
Sergey Kalinichenko

I've found this page to be a helpful guide in override equals- and hash-type methods. It includes a decent algorithm for calculating hash codes. The page is geared towards Java, but it's pretty easy to adapt it to Objective-C/Cocoa.


c
ceperry

I'm an Objective C newbie too, but I found an excellent article about identity vs. equality in Objective C here. From my reading it looks like you might be able to just keep the default hash function (which should provide a unique identity) and implement the isEqual method so that it compares data values.


I'm a Cocoa/Objective C newbie, and this answer and link really helped me cut through all the more advanced stuff above to the bottom line - I don't need to worry about hashes - just implementing the isEqual: method. Thanks!
Do not miss @ceperry's link. The article Equality vs Identity by Karl Kraft is really good.
@John: I think you should re-read the article. It says very clearly that "instances that are equal must have equal hash values." If you override isEqual:, you must also override hash.
佚名

Quinn is just wrong that the reference to the murmur hash is useless here. Quinn is right that you want to understand the theory behind hashing. The murmur distills a lot of that theory into an implementation. Figuring out how to apply that implementation to this particular application is worth exploring.

Some of the key points here:

The example function from tcurdt suggests that '31' is a good multiplier because it is prime. One needs to show that being prime is a necessary and sufficient condition. In fact 31 (and 7) are probably not particularly good primes because 31 == -1 % 32. An odd multiplier with about half the bits set and half the bits clear is likely to be better. (The murmur hash multiplication constant has that property.)

This type of hash function would likely be stronger if, after multiplying, the result value were adjusted via a shift and xor. The multiplication tends to produce the results of lots of bit interactions at the high end of the register and low interaction results at the bottom end of the register. The shift and xor increases the interactions at the bottom end of the register.

Setting the initial result to a value where about half the bits are zero and about half the bits are one would also tend to be useful.

It may be useful to be careful about the order in which elements are combined. One should probably first process booleans and other elements where the values are not strongly distributed.

It may be useful to add a couple of extra bit scrambling stages at the end of the computation.

Whether or not the murmur hash is actually fast for this application is an open question. The murmur hash premixes the bits of each input word. Multiple input words can be processed in parallel which helps multiple-issue pipelined cpus.


C
Community

Combining @tcurdt's answer with @oscar-gomez's answer for getting property names, we can create an easy drop-in solution for both isEqual and hash:

NSArray *PropertyNamesFromObject(id object)
{
    unsigned int propertyCount = 0;
    objc_property_t * properties = class_copyPropertyList([object class], &propertyCount);
    NSMutableArray *propertyNames = [NSMutableArray arrayWithCapacity:propertyCount];

    for (unsigned int i = 0; i < propertyCount; ++i) {
        objc_property_t property = properties[i];
        const char * name = property_getName(property);
        NSString *propertyName = [NSString stringWithUTF8String:name];
        [propertyNames addObject:propertyName];
    }
    free(properties);
    return propertyNames;
}

BOOL IsEqualObjects(id object1, id object2)
{
    if (object1 == object2)
        return YES;
    if (!object1 || ![object2 isKindOfClass:[object1 class]])
        return NO;

    NSArray *propertyNames = PropertyNamesFromObject(object1);
    for (NSString *propertyName in propertyNames) {
        if (([object1 valueForKey:propertyName] != [object2 valueForKey:propertyName])
            && (![[object1 valueForKey:propertyName] isEqual:[object2 valueForKey:propertyName]])) return NO;
    }

    return YES;
}

NSUInteger MagicHash(id object)
{
    NSUInteger prime = 31;
    NSUInteger result = 1;

    NSArray *propertyNames = PropertyNamesFromObject(object);

    for (NSString *propertyName in propertyNames) {
        id value = [object valueForKey:propertyName];
        result = prime * result + [value hash];
    }

    return result;
}

Now, in your custom class you can easily implement isEqual: and hash:

- (NSUInteger)hash
{
    return MagicHash(self);
}

- (BOOL)isEqual:(id)other
{
    return IsEqualObjects(self, other);
}

u
user10345

Note that if you're creating a object that can be mutated after creation, the hash value must not change if the object is inserted into a collection. Practically speaking, this means that the hash value must be fixed from the point of the initial object creation. See Apple's documentation on the NSObject protocol's -hash method for more information:

If a mutable object is added to a collection that uses hash values to determine the object’s position in the collection, the value returned by the hash method of the object must not change while the object is in the collection. Therefore, either the hash method must not rely on any of the object’s internal state information or you must make sure the object’s internal state information does not change while the object is in the collection. Thus, for example, a mutable dictionary can be put in a hash table but you must not change it while it is in there. (Note that it can be difficult to know whether or not a given object is in a collection.)

This sounds like complete whackery to me since it potentially effectively renders hash lookups far less efficient, but I suppose it's better to err on the side of caution and follow what the documentation says.


You're reading the hash docs wrong — it's essentially an "either-or" situation. If the object changes, the hash generally also changes. This is really a warning to the programmer, that if the hash changes as a result of mutating an object, then changing the object while it resides in a collection that utilizes the hash will cause unexpected behavior. If the object must be "safely mutable" in such a situation, you have no choice but to make the hash unrelated to the mutable state. That particular situation does sound strange to me, but there are certainly infrequent situations where it applies.
T
Thibaud de Souza

Sorry if I risk sounding a complete boffin here but... ...nobody bothered mentioning that to follow 'best practices' you should definitely not specify an equals method that would NOT take into account all data owned by your target object, e.g whatever data is aggregated to your object, versus an associate of it, should be taken into account when implementing equals. If you don't want to take, say 'age' into account in a comparison, then you should write a comparator and use that to perform your comparisons instead of isEqual:.

If you define an isEqual: method that performs equality comparison arbitrarily, you incur the risk that this method is misused by another developer, or even yourself, once you've forgotten the 'twist' in your equals interpretation.

Ergo, although this is a great q&a about hashing, you don't normally need to redefine the hashing method, you should probably define an ad-hoc comparator instead.


关注公众号,不定期副业成功案例分享
Follow WeChat

Success story sharing

Want to stay one step ahead of the latest teleworks?

Subscribe Now