ChatGPT解决这个技术问题 Extra ChatGPT

Why does string::compare return an int?

c++

Why does string::compare return an int instead of a smaller type like short or char? My understanding is that this method only returns -1, 0 or 1.

Second part, if I was to design a compare method that compares two objects of type Foo and I only wanted to return -1, 0 or 1, would using short or char generally be a good idea?

EDIT: I've been corrected, string::compare does not return -1, 0, or 1, it in fact returns a value >0, <0 or 0. Thanks for keeping me in line guys.

It seems like the answer is roughly, there is no reason to return a type smaller than int because return values are "rvalues" and those "rvalues" don't benefit from being smaller than type int (4 bytes). Also, many people pointed out that the registers of most systems are probably going to be of size int anyway, since these registers are going to be filled whether you give them a 1, 2 or 4 byte value, there is no real advantage to returning a smaller value.

EDIT 2: In fact it looks like there may be extra processing overhead when using smaller datatypes such as alignment, masking, etc. The general consensus is that the smaller datatypes exist to conserve on memory when working with a lot of data, as in the case of an array.

Learned something today, thanks again guys!

I think what would be better is if there was a more specific type that could be used for this. One which contains only -1, 0 and 1 in the style of Ada95.
The documentation for string::compare() you link to clearly states the return value is <0, 0, and >0 -not- -1, 0 and 1.
What would be the advantage of using short or char instead of int? Most architectures are going to store the return value of a function in a register, and an int will fit in a register just as well as a short or char. And using char for numeric types is always a bad idea, especially when you need to guarantee signed values are handled correctly.
Captain Obvlious, your name and comment... Just priceless.
Using char would be a bad idea, since code checking for the return value if it's less than zero will fail on platforms where char is unsigned.

T
Tas

First, the specification is that it will return a value less than, equal to or greater than 0, not necessarily -1 or 1. Secondly, return values are rvalues, subject to integral promotion, so there's no point in returning anything smaller.

In C++ (as in C), every expression is either an rvalue or an lvalue. Historically, the terms refer to the fact that lvalues appear on the left of an assignment, where as rvalues can only appear on the right. Today, a simple approximation for non-class types is that an lvalue has an address in memory, an rvalue doesn't. Thus, you cannot take the address of an rvalue, and cv-qualifiers (which condition "access") don't apply. In C++ terms, an rvalue which doesn't have class type is a pure value, not an object. The return value of a function is an rvalue, unless it has reference type. (Non-class types which fit in a register will almost always be returned in a register, for example, rather than in memory.)

For class types, the issues are a bit more complex, due to the fact that you can call member functions on an rvalue. This means that rvalues must in fact have addresses, for the this pointer, and can be cv-qualified, since the cv-qualification plays a role in overload resolution. Finally, C++11 introduces several new distinctions, in order to support rvalue references; these, too, are mainly applicable to class types.

Integral promotion refers to the fact that when integral types smaller than an int are used as rvalues in an expression, in most contexts, they will be promoted to int. So even if I have a variable declared short a, b;, in the expression a + b, both a and b are promoted to int before the addition occurs. Similarly, if I write a < 0, the comparison is done on the value of a, converted to an int. In practice, there are very few cases where this makes a difference, at least on 2's complements machines where integer arithmetic wraps (i.e. all but a very few exotics, today—I think the Unisys mainframes are the only exceptions left). Still, even on the more common machines:

short a = 1;
std::cout << sizeof( a ) << std::endl;
std::cout << sizeof( a + 0 ) << std::endl;

should give different results: the first is the equivalent of sizeof( short ), the second sizeof( int ) (because of integral promotion).

These two issues are formally orthogonal; rvalues and lvalues have nothing to do with integral promotion. Except... integral promotion only applies to rvalues, and most (but not all) of the cases where you would use an rvalue will result in integral promotion. For this reason, there is really no reason to return a numeric value in something smaller than int. There is even a very good reason not to return it as a character type. Overloaded operators, like <<, often behave differently for character types, so you only want to return characters as character types. (You might compare the difference:

char f() { return 'a'; }
std::cout << f() << std::endl;      //  displays "a"
std::cout << f() + 0 << std::endl;  //  displays "97" on my machine

The difference is that in the second case, the addition has caused integral promotion to occur, which results in a different overload of << to be chosen.


It would be nice if you can explain more on return values are rvalues, subject to integral promotion in your answer.
"return values are rvalues ... so there's no point in returning anything smaller" LIKE IT
@AlvinWong: See the answers to Why are C character literals ints instead of chars? for some more background information.
I wish I could +1 this again, after the superb explanation your edit added.
What if it was signed char? Would it behave the same as a signed char, or would it be a different type?
l
leander

It is intentional that it doesn't return -1, 0 or 1.

It allows (note this is not for strings, but it applies equally to strings)

int compare(int *a, int *b)
{
   return *a - *b;
}

which is a lot less cumbersome than:

int compare(int *a, int *b)
{
   if (*a == *b) return 0;
   if (*a > *b) return 1;
   return -1;
}

which is what you'd have to do [or something along those lines] if you have to return -1, 0 or 1.

And it works for more complex types too:

class Date
{
    int year;
    int month;
    int day;
}

int compare(const Date &a, const Date &b)
{
   if (a.year != b.year) return a.year - b.year;
   if (a.month != b.month) return a.month - b.month;
   return a.day - b.day;
}

In the string case, we can do this:

int compare(const std::string& a, const std::string& b)
{
   int len = min(a.length(), b.length());

   for(int i = 0; i < len; i++)
   {
      if (a[i] != b[i]) return a[i] - b[i];
   }
   // We only get here if the string is equal all the way to one of them
   // ends. If the length isn't equal, "longest" wins. 
   return a.length() - b.length();
}

Your first compare function has problems with overflow that (fortunately) don't apply equally if it takes char* and char is smaller than int. For example, if *a is MAX_INT and *b is -1 then *a - *b is UB, but if the implementation chooses to define its behavior then the result almost certainly is negative.
Problem with your last example: length() returns a size_t, which may be larger than int
Yeah, that may be a problem if your strings are more than 2GB long. I have done 1GB long strings as a test-case for storing things in a fifo once. But sure, someone dealing with a string containing a MPEG encoded as Base64 or some such may well run into that problem...
@MatsPetersson it's more of a fundamental problem, because the question is “why does it return an int?”
Well, I'm sure that's hysterical - I mean historical reasons - and probably so that it is compatible with strcmp/memcmp and other compare type operations.
T
Tobia

int is usually (meaning on most modern hardware) an integer of the same size as the system bus and/or the cpu registers, what is called the machine word. Therefore int is usually passed along faster than smaller types, because it doesn't require alignment, masking and other operations.

The smaller types exist mainly to allow RAM usage optimization for arrays and structs. In most cases they trade a few CPU cycles (in the form of aligment operations) for a better RAM usage.

Unless you need to enforce your return value to be a signed or unsigned number of a centain size (char, short…) your are better off using int, which is why the standard library does it.


Great way of explaining the hardware-side of things in a way that makes sense.
A
Alex Chamberlain

It's a C-ism.

When C required compare-type functions, they always returned an int. C++ just carried that forward (unfortunately).

However, returning an int is realistically probably the fastest way, as it's generally the size of the registers of the system in use. (Deliberately vague.)


Actually short and char can impose performance penalties, e.g. 255+7 has a different value for a char and an int so a correct implementation cannot necessarily simply store a char where an int can go without taking care of handing its semantics. Compilers won't necessarily optimise out the inefficiency this imposes.
J
Jon

The method doesn't actually return an integer in the set { -1, 0, 1 }; it can actually be any integral value.

Why? The main reason I can think of is that int is supposed to be the "natural size" value for the architecture; operations on values of this size are typically at least as fast (and in many cases faster) than operations on smaller or larger values. So this is a case of allowing the implementation enough slack to use whatever is fastest.


B
BЈовић

if I was to design a compare method that compares two objects of type Foo and I only wanted to return -1, 0 or 1, would using short or char generally be a good idea?

It would be ok idea. A better way would be to return a bool (if only want to compare if equal), or enum (for more information) :

enum class MyResult
{
  EQUAL,
  LESS,
  GREATER
};

MyResult AreEqual( const Foo &foo1, const Foo & foo2 )
{
  // calculate and return result
}

"It would be ok idea". Do you have a rationale for that?
m
masoud

Suppose some people are changing a code from C to C++. They decided to replace strcmp to string::compare.

Since strcmp returns int, it's easier to string::compare return int, as a gift.


S
Shafik Yaghmour

Probably to make it work more like strcmp which also has this set of return values. If you wanted to port code it would probably be more intuitive to have replacements that cleave as close as possible.

Also, the return value is not just -1, 0 or 1 but <0, 0 or >0.

Also, as was mentioned since the return is subject to integral promotion it does not make sense to make it smaller.


M
MDMoore313

because a boolean return value can only be two possible values (true, false), and a compare function can return three possible values (less than, equal, greater than).

Update

While certainly possible to return a signed short, if you really wanted to implement your own compare function, you could return a nibble or struct value with two booleans.


Nowhere in the question does it say anything about returning a Boolean type. In fact, he specifically proposes short and char as alternatives to int.