ChatGPT解决这个技术问题 Extra ChatGPT

When to use std::size_t?

I'm just wondering should I use std::size_t for loops and stuff instead of int? For instance:

#include <cstdint>

int main()
{
    for (std::size_t i = 0; i < 10; ++i) {
        // std::size_t OK here? Or should I use, say, unsigned int instead?
    }
}

In general, what is the best practice regarding when to use std::size_t?


C
CB Bailey

A good rule of thumb is for anything that you need to compare in the loop condition against something that is naturally a std::size_t itself.

std::size_t is the type of any sizeof expression and as is guaranteed to be able to express the maximum size of any object (including any array) in C++. By extension it is also guaranteed to be big enough for any array index so it is a natural type for a loop by index over an array.

If you are just counting up to a number then it may be more natural to use either the type of the variable that holds that number or an int or unsigned int (if large enough) as these should be a natural size for the machine.


It's worth mentioning that not using size_t when you should can lead to security bugs.
Not only is int "natural", but mixing signed and unsigned type can lead to security bugs just as well. Unsigned indices are a pain to handle and a good reason to use a custom vector class.
@JoSo There is also ssize_t for signed values.
@EntangledLoops ssize_t does not have the full range of size_t. It just is the signed variant of whatever size_t would translate into. This means, that the full range of the memory is not usable with ssize_t and integer overflows could happen when depending on variables of type size_t.
@Thomas Yes, but I'm not sure what point you're making. I just meant as a drop-in replacement for int, it's a closer semantic fit. Your comment about the full range not being available with ssize_t is true, but it's also true of int. What really matters is using the appropriate type for the application.
G
Gregory Pakosz

size_t is the result type of the sizeof operator.

Use size_t for variables that model size or index in an array. size_t conveys semantics: you immediately know it represents a size in bytes or an index, rather than just another integer.

Also, using size_t to represent a size in bytes helps making the code portable.


p
paxdiablo

The size_t type is meant to specify the size of something so it's natural to use it, for example, getting the length of a string and then processing each character:

for (size_t i = 0, max = strlen (str); i < max; i++)
    doSomethingWith (str[i]);

You do have to watch out for boundary conditions of course, since it's an unsigned type. The boundary at the top end is not usually that important since the maximum is usually large (though it is possible to get there). Most people just use an int for that sort of thing because they rarely have structures or arrays that get big enough to exceed the capacity of that int.

But watch out for things like:

for (size_t i = strlen (str) - 1; i >= 0; i--)

which will cause an infinite loop due to the wrapping behaviour of unsigned values (although I've seen compilers warn against this). This can also be alleviated by the (slightly harder to understand but at least immune to wrapping problems):

for (size_t i = strlen (str); i-- > 0; )

By shifting the decrement into a post-check side-effect of the continuation condition, this does the check for continuation on the value before decrement, but still uses the decremented value inside the loop (which is why the loop runs from len .. 1 rather than len-1 .. 0).


By the way, it's a bad practice to call strlen on each iteration of a loop. :) You can do something like this: for (size_t i = 0, len = strlen(str); i < len; i++) ...
Even if it were a signed type, you have to watch out for boundary conditions, perhaps even more so since signed integer overflow is undefined behavior.
Counting down correctly can be done in the following (infamous) way: for (size_t i = strlen (str); i --> 0;)
@JoSo, that's actually quite a neat trick though I'm not sure I like the introduction of the --> "goes to" operator (see stackoverflow.com/questions/1642028/…). Have incorporated your suggestion into the answer.
Can you do a simple if (i == 0) break; at the end of the for loop (e.g., for (size_t i = strlen(str) - 1; ; --i). (I like yours better though, but just wondering if this would work just as well).
D
Daniel Daranas

By definition, size_t is the result of the sizeof operator. size_t was created to refer to sizes.

The number of times you do something (10, in your example) is not about sizes, so why use size_t? int, or unsigned int, should be ok.

Of course it is also relevant what you do with i inside the loop. If you pass it to a function which takes an unsigned int, for example, pick unsigned int.

In any case, I recommend to avoid implicit type conversions. Make all type conversions explicit.


A
Arne

short answer:

almost never

long answer:

Whenever you need to have a vector of char bigger that 2gb on a 32 bit system. In every other use case, using a signed type is much safer than using an unsigned type.

example:

std::vector<A> data;
[...]
// calculate the index that should be used;
size_t i = calc_index(param1, param2);
// doing calculations close to the underflow of an integer is already dangerous

// do some bounds checking
if( i - 1 < 0 ) {
    // always false, because 0-1 on unsigned creates an underflow
    return LEFT_BORDER;
} else if( i >= data.size() - 1 ) {
    // if i already had an underflow, this becomes true
    return RIGHT_BORDER;
}

// now you have a bug that is very hard to track, because you never 
// get an exception or anything anymore, to detect that you actually 
// return the false border case.

return calc_something(data[i-1], data[i], data[i+1]);

The signed equivalent of size_t is ptrdiff_t, not int. But using int is still much better in most cases than size_t. ptrdiff_t is long on 32 and 64 bit systems.

This means that you always have to convert to and from size_t whenever you interact with a std::containers, which not very beautiful. But on a going native conference the authors of c++ mentioned that designing std::vector with an unsigned size_t was a mistake.

If your compiler gives you warnings on implicit conversions from ptrdiff_t to size_t, you can make it explicit with constructor syntax:

calc_something(data[size_t(i-1)], data[size_t(i)], data[size_t(i+1)]);

if just want to iterate a collection, without bounds cheking, use range based for:

for(const auto& d : data) {
    [...]
}

here some words from Bjarne Stroustrup (C++ author) at going native

For some people this signed/unsigned design error in the STL is reason enough, to not use the std::vector, but instead an own implementation.


I understand where they're coming from, but I still think it's weird to write for(int i = 0; i < get_size_of_stuff(); i++). Now, sure, you might not want to do a lot of raw loops, but - come on, you use them too.
The only reason I use raw loops, is because the c++ algorithm library is designed pretty badly. There are languages, like Scala, that have a much better and more evolved library to operate on collections. Then the use case of raw loops is pretty much eliminated. There are also approaches to improve c++ with a new and better STL, but I doubt this will happen within the next decade.
I get that unsigned i = 0; assert(i-1, MAX_INT); but I don't understand why you say "if i already had an underflow, this becomes true" because the behaviour of arithmetic on unsigned ints is always defined, ie. the result is the result modulo the size of the largest representable integer. So if i==0, then i-- becomes MAX_INT and then i++ becomes 0 again.
@mabraham I looked carefully, and you are right, my code is not the best to show the problem. Normally this is x + 1 < y equivalent to x < y - 1, but they are not with unsigend integers. That can easily introduce bugs when things are transformed that are assumed to be equivalent.
O
Ofir

size_t is a very readable way to specify the size dimension of an item - length of a string, amount of bytes a pointer takes, etc. It's also portable across platforms - you'll find that 64bit and 32bit both behave nicely with system functions and size_t - something that unsigned int might not do (e.g. when should you use unsigned long


P
Peter Alexander

Use std::size_t for indexing/counting C-style arrays.

For STL containers, you'll have (for example) vector<int>::size_type, which should be used for indexing and counting vector elements.

In practice, they are usually both unsigned ints, but it isn't guaranteed, especially when using custom allocators.


With gcc on linux, std::size_t is usually unsigned long (8 bytes on 64 bits systems) rather than unisgned int (4 bytes).
C-style arrays are not indexed by size_t though, since the indexes can be negative. One could use size_t for one's own instance of such an array if one doesn't want to go negative, though.
Since C-style array indexing is equivalent to using operator + on pointers, it would seem that ptrdiff_t is the one to use for indices.
As for vector<T>::size_type (and ditto for all other containers), it's actually rather useless, because it is effectively guaranteed to be size_t - it's typedef'd to Allocator::size_type, and for restrictions on that with respect to containers see 20.1.5/4 - in particular, size_type must be size_t, and difference_type must be ptrdiff_t. Of course, the default std::allocator<T> satisfies those requirements. So just use the shorter size_t and don't bother with the rest of the lot :)
I have to comment about C-style arrays and negative indices. Yes you can, but you shouldn't. Accessing outside the array bounds is undefined. And if you are doing tricky things with pointers, doing it with an array index instead of pointer math (and lots of code comments) is a confusing, bad idea.
K
KittMedia

Soon most computers will be 64-bit architectures with 64-bit OS:es running programs operating on containers of billions of elements. Then you must use size_t instead of int as loop index, otherwise your index will wrap around at the 2^32:th element, on both 32- and 64-bit systems.

Prepare for the future!


Your argument only goes as far as meaning one needs a long int rather than an int. If size_t is relevant on a 64-bit OS it was just as relevant on a 32-bit OS.
a
ascotan

size_t is returned by various libraries to indicate that the size of that container is non-zero. You use it when you get once back :0

However, in the your example above looping on a size_t is a potential bug. Consider the following:

for (size_t i = thing.size(); i >= 0; --i) {
  // this will never terminate because size_t is a typedef for
  // unsigned int which can not be negative by definition
  // therefore i will always be >= 0
  printf("the never ending story. la la la la");
}

the use of unsigned integers has the potential to create these types of subtle issues. Therefore imho I prefer to use size_t only when I interact with containers/types that require it.


Everone seems to use size_t in loop without bothering about this bug, and I learned this the hard way
K
Kemin Zhou

When using size_t be careful with the following expression

size_t i = containner.find("mytoken");
size_t x = 99;
if (i-x>-1 && i+x < containner.size()) {
    cout << containner[i-x] << " " << containner[i+x] << endl;
}

You will get false in the if expression regardless of what value you have for x. It took me several days to realize this (the code is so simple that I did not do unit test), although it only take a few minutes to figure the source of the problem. Not sure it is better to do a cast or use zero.

if ((int)(i-x) > -1 or (i-x) >= 0)

Both ways should work. Here is my test run

size_t i = 5;
cerr << "i-7=" << i-7 << " (int)(i-7)=" << (int)(i-7) << endl;

The output: i-7=18446744073709551614 (int)(i-7)=-2

I would like other's comments.


please note that (int)(i - 7) is an underflow that is cast to int afterwards, whileint(i) - 7 is not an underflow since you first convert i to an int, and then subtract 7. Additionally I found your example confusing.
My point is that int is usually safer when you do subtractions.
Y
Yann

It is often better not to use size_t in a loop. For example,

vector<int> a = {1,2,3,4};
for (size_t i=0; i<a.size(); i++) {
    std::cout << a[i] << std::endl;
}
size_t n = a.size();
for (size_t i=n-1; i>=0; i--) {
    std::cout << a[i] << std::endl;
}

The first loop is ok. But for the second loop: When i=0, the result of i-- will be ULLONG_MAX (assuming size_t = unsigned long long), which is not what you want in a loop. Moreover, if a is empty then n=0 and n-1=ULLONG_MAX which is not good either.


U
Unknown

size_t is an unsigned type that can hold maximum integer value for your architecture, so it is protected from integer overflows due to sign (signed int 0x7FFFFFFF incremented by 1 will give you -1) or short size (unsigned short int 0xFFFF incremented by 1 will give you 0).

It is mainly used in array indexing/loops/address arithmetic and so on. Functions like memset() and alike accept size_t only, because theoretically you may have a block of memory of size 2^32-1 (on 32bit platform).

For such simple loops don't bother and use just int.


H
Hilario Nengare

I have been struggling myself with understanding what and when to use it. But size_t is just an unsigned integral data type which is defined in various header files such as <stddef.h>, <stdio.h>, <stdlib.h>, <string.h>, <time.h>, <wchar.h> etc.

It is used to represent the size of objects in bytes hence it's used as the return type by the sizeof operator. The maximum permissible size is dependent on the compiler; if the compiler is 32 bit then it is simply a typedef (alias) for unsigned int but if the compiler is 64 bit then it would be a typedef for unsigned long long. The size_t data type is never negative(excluding ssize_t) Therefore many C library functions like malloc, memcpy and strlen declare their arguments and return type as size_t.

/ Declaration of various standard library functions.
  
// Here argument of 'n' refers to maximum blocks that can be
// allocated which is guaranteed to be non-negative.
void *malloc(size_t n);
  
// While copying 'n' bytes from 's2' to 's1'
// n must be non-negative integer.
void *memcpy(void *s1, void const *s2, size_t n);
  
// the size of any string or `std::vector<char> st;` will always be at least 0.
size_t strlen(char const *s);

size_t or any unsigned type might be seen used as loop variable as loop variables are typically greater than or equal to 0.


Your answer is all about C language, but the question is tagged C++ instead. In C++, we don't use malloc/free, even new/delete have very few valid use cases in C++. For dynamic memory management, we use smart pointers (such as std::unique_ptr) instead (if even needed, because regular stuff can often be done using standard containers, such as std::vector). Also, in C++, we don't #include <stddef.h> and don't #include <string.h>. Instead, we #include <string> and #include <cstddef>, and use std::string. C and C++ are different languages.
Oops. Sorry really didn't pay attention there , thanks
m
monkeyking

size_t is an unsigned integral type, that can represent the largest integer on you system. Only use it if you need very large arrays,matrices etc.

Some functions return an size_t and your compiler will warn you if you try to do comparisons.

Avoid that by using a the appropriate signed/unsigned datatype or simply typecast for a fast hack.


Only use it if you want to avoid bugs and security holes.
It may not actually be able to represent the largest integer on your system.
A
Ashish

size_t is unsigned int. so whenever you want unsigned int you can use it.

I use it when i want to specify size of the array , counter ect...

void * operator new (size_t size); is a good use of it.

Actually it's not necessarily the same as unsigned int. It is unsigned, but it might be larger (or I guess smaller though I don't know of any platforms where this is true) than an int.
For example, on a 64 bit machine size_t might be an unsigned 64 bit integer, while on a 32 bit machine it is only a 32 bit unsigned integer.