ChatGPT解决这个技术问题 Extra ChatGPT

What is the maximum length in chars needed to represent any double value?

When I convert an unsigned 8-bit int to string then I know the result will always be at most 3 chars (for 255) and for an signed 8-bit int we need 4 chars for e.g. "-128".

Now what I'm actually wondering is the same thing for floating-point values. What is the maximum number of chars required to represent any "double" or "float" value as a string?

Assume a regular C/C++ double (IEEE 754) and normal decimal expansion (i.e. no %e printf-formatting).

I'm not even sure if the really small number (i.e. 0.234234) will be longer than the really huge numbers (doubles representing integers)?

Jalf, why would anybody mention that? Who said he's asking how big a fixed-size buffer would need to be? Maybe he wants to know how many character columns he needs to reserve on the console for a text-based table.
Without scientific notation it would be long for values at the extremes of the magnitude range, but what would be the point? Who would read such a number - a double (typically) has approximately 15 significant decimal digits - all the rest would be a large number of leading or traiining zeros.
No you can have much more than 15 significant digits for decimal digits but only 15 significant digits for integer. This is because while you can represent all integers you can't represent all decimal expansions so fewer bits can be used to cover a larger range.
I'm not printing numbers for people to read, I'm trying to find the required char buffer size needed in order to be sure that the reverse of strtod (i.e "dtoa(double d, char* output)") can finish safely with no risk of buffer overflows.
@matrin, I tried with for loop and multiplied the number as long as it gave 1.#INF00, largest number was 286 bytes long. So i guess you are safe with 512 bytes? (using printf).

M
Mike Seymour

The standard header <float.h> in C, or <cfloat> in C++, contains several constants to do with the range and other metrics of the floating point types. One of these is DBL_MAX_10_EXP, the largest power-of-10 exponent needed to represent all double values. Since 1eN needs N+1 digits to represent, and there might be a negative sign as well, then the answer is

int max_digits = DBL_MAX_10_EXP + 2;

This assumes that the exponent is larger than the number of digits needed to represent the largest possible mantissa value; otherwise, there will also be a decimal point followed by more digits.

CORRECTION

The longest number is actually the smallest representable negative number: it needs enough digits to cover both the exponent and the mantissa. This value is -pow(2, DBL_MIN_EXP - DBL_MANT_DIG), where DBL_MIN_EXP is negative. It's fairly easy to see (and prove by induction) that -pow(2,-N) needs 3+N characters for a non-scientific decimal representation ("-0.", followed by N digits). So the answer is

int max_digits = 3 + DBL_MANT_DIG - DBL_MIN_EXP

For a 64-bit IEEE double, we have

DBL_MANT_DIG = 53
DBL_MIN_EXP = -1023
max_digits = 3 + 53 - (-1023) = 1079

Sorry, this is wrong - the longest numbers will be very small ones, not very large ones, and it's more complicated to work out their length. It should be possible to work it out from DBL_MIN_EXP and DBL_MANT_DIG; I'll update the answer if I can work it out.
Thanks for the great answer Mike. In case someone needs clarification: Basically, floating-point has a special rule that says if all bits in the exponent are zero then the number is called "denormalized" and the implicit leading "1." (which is otherwise always prefixed onto the fractional part) is not included. This means that by setting the exponent bits to all zero you can use the fractional part to get another 15-16 digits onto the length of the base10 decimal expansion (i.e. like 0.000...0001*2^-1023 where -1023 is the exponent which is encoded as all zeros due to the exponent bias).
-1023 as the DBL_MIN_EXP is bits and not digits, so the calculation of max_digits here does not seem right
You must be kidding me... 1079 digits? Could you make an example C program that generates that number in stdout?
This answer is (still) WRONG. The maximum number of characters that will be required to print any decimal double value (i.e. in "%f" format) will be for the value of -DBL_MIN (i.e. -0x1p-1022, assuming binary64 IEEE 754 is your double). For that you'll need exactly 325 characters. That's: DBL_DIG + abs(DBL_MIN_10_EXP) + strlen("-0."). This is of course because log10(fabs(DBL_MIN)) is 308, which is also abs(DBL_MIN_10_EXP)+1 (the +1 is because of the leading digit to the left of the decimal place), and that's the number of leading zeros to the left of the significant digits.
F
Francisco

According to IEEE 754-1985, the longest notation for value represented by double type, i.e.:

-2.2250738585072020E-308

has 24 chars.


Cool, can you explain why this is the longest one? Why can't for example a really small 0.033211233457645...234234 become longer?
If you don't want the scientific notation that would give you 308 more chars then...
Becouse mantissa of double, following IEEE 754-1985, can represent numbers with maximum accuracy of 17 digits after point. Add to it both minuses for mantissa and period, point, e-char and 3 digits of period (8 bit), and you will get exact 24 chars.
0.00000000000000000 ... approx 308 ... 00002225073858507202 ==> approx 326 chars :)
@VitaliyUlantikov Hi, could you tell me why IEEE 754-1985, can represent numbers with maximum accuracy of 17 digits after point. Why 17 digits? Thanks.
佚名

A correct source of information that goes into more detail than the IEEE-754 Specification are these lecture notes from UC Berkely on page 4, plus a little bit of DIY calculations. These lecture slides are also good for engineering students.

Recommended Buffer Sizes

| Single| Double | Extended | Quad  |
|:-----:|:------:|:--------:|:-----:|
|   16  |  24    |    30    |  45   |

These numbers are based on the following calculations:

Maximum Decimal Count of the Integral Portion

| Single| Double | Extended | Quad  |
|:-----:|:------:|:--------:|:-----:|
|   9   |   17   |    21    |  36   |

* Quantities listed in decimals.

Decimal counts are based on the formula: At most Ceiling(1 + NLog_10(2)) decimals, where N is the number of bits in the integral portion*.

Maximum Exponent Lengths

| Single| Double | Extended | Quad  |
|:-----:|:------:|:--------:|:-----:|
|   5   |   5    |     7    |   7   |
* Standard format is `e-123`.

Fastest Algorithm

The fastest algorithm for printing floating-point numbers is the Grisu2 algorithm detailed in the research paper Printing Floating-point Numbers Quickly and Accurately. The best benchmark I could find can be found here.


p
pmg

You can use snprintf() to check how many chars you need. snprintf() returns the number of chars needed to print whatever is passed to it.

/* NOT TESTED */
#include <stdio.h>
#include <stdlib.h>
int main(void) {
    char dummy[1];
    double value = 42.000042; /* or anything else */
    int siz;
    char *representation;
    siz = snprintf(dummy, sizeof dummy, "%f", value);
    printf("exact length needed to represent 'value' "
           "(without the '\\0' terminator) is %d.\n", siz);
    representation = malloc(siz + 1);
    if (representation) {
        sprintf(representation, "%f", value);
        /* use `representation` */
        free(representation);
    } else {
        /* no memory */
    }
    return 0;
}

Note: snprintf() is a C99 function. If a C89 compiler provides it as an extension, it may not do what the above program expects.

Edit: Changed the link to snprintf() to one that actually describes the functionality imposed by the C99 Standard; the description in the original link is wrong.
2013: Changed the link back to POSIX site which I prefer over the site of the first edit.


Now to answer the question one just need to figure out which number to assign to double value = ??;
v
vtomazzi

"What is the maximum length in chars needed to represent any double value?"

The exact answer to this question is: 8 ASCII chars - in a hexadicimal format, excluding the '0x' prefix - 100% accuracy :) (but it's not just a joke)

The usable precision of IEEE-754 double is around 16 decimal digits - so excluding educational purposes, representations longer than that are just a waste of resources and computing power:

Users are not getting more informed when they see a 700-digit-number on the screeen.

Configuration variables stored in that "more accurate" form are useless - every single operation on such number will destroy the accuracy. (excluding changing the sign bit)

If someone needs better real precision, then there's 80-bit long double with around 18-digit accuracy or f.e. libquadmath.

Regards.


S
S.Lott

Depends on what you mean by "represent". Decimal fraction don't have exact floating-point representations. When you convert decimal fraction -> binary fraction -> decimal, you do not have exact decimal representations and will have noise bits at the end of the binary representation.

The question didn't involve starting from decimal, but all source code (and must user input) is decimal, and involves the possible truncation issue. What does "exact" mean under these circumstances?

Basically, it depends on your floating point representation.

If you have 48 bits of mantissa, this takes about 16 decimal digits. The exponent might be the remaining 14 bits (about 5 decimal digits).

The rule of thumb is that the number of bits is about 3x the number of decimal digits.


Actually they can't have infinite digits because any binary fraction can be expressed precisely in decimal notation.
I mean represent as an exact base 10 decimal expansion (e.g. no %e stuff). And I'm assuming we're dealing with standard C/C++ double using IEEE 754.
Of course. I mean, in the memory the double is stored in a binary notation like 1101010.101000110101. The number of digits after the binary point is pretty much finite and thus can be represented precisely in decimal. 1/2 is represented as 0.5; 1/4 is represented as 0.25; 1/8 is represented as 0.125; etc. Am I missing something?
The problem is the other way around. Every decimal fraction does not have an exact binary representation. So, if your "original" number was a decimal number, the float->decimal will be wrong to start with. The number of decimal digits doesn't matter if what you're comparing with was decimal.
Of course decimal fractions cannot be represented precisely in binary fractions. I never said otherwise. But binary fractions can be represented precisely in decimal fractions, so IMHO "Floating-point values do not have exact decimal representations and can have an infinite number of repeating decimal digits." isn't correct.
C
Charles Salvia

You can control the number of digits in the string representation when you convert the float/double to a string by setting the precision. The maximum number of digits would then be equal to the string representation of std::numeric_limits<double>::max() at the precision you specify.

#include <iostream>
#include <limits>
#include <sstream>
#include <iomanip>

int main()
{
 double x = std::numeric_limits<double>::max();

 std::stringstream ss;
 ss << std::setprecision(10) << std::fixed << x;

 std::string double_as_string = ss.str();
 std::cout << double_as_string.length() << std::endl;
}

So, the largest number of digits in a double with a precision of 10 is 320 digits.


F
Fred

1024 is not enough, the smallest negative double value has 1077 decimal digits. Here is some Java code.

double x = Double.longBitsToDouble(0x8000000000000001L);
BigDecimal bd = new BigDecimal(x);
String s = bd.toPlainString();
System.out.println(s.length());
System.out.println(s);

Here is the output of the program.

1077
-0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000004940656458412465441765687928682213723650598026143247644255856825006755072702087518652998363616359923797965646954457177309266567103559397963987747960107818781263007131903114045278458171678489821036887186360569987307230500063874091535649843873124733972731696151400317153853980741262385655911710266585566867681870395603106249319452715914924553293054565444011274801297099995419319894090804165633245247571478690147267801593552386115501348035264934720193790268107107491703332226844753335720832431936092382893458368060106011506169809753078342277318329247904982524730776375927247874656084778203734469699533647017972677717585125660551199131504891101451037862738167250955837389733598993664809941164205702637090279242767544565229087538682506419718265533447265625

// prints one longer than 1077: import java.math.BigDecimal; public class Program { public static void print_bigdecimal(String name, BigDecimal bd) { String s = bd.toPlainString(); System.out.println ("NUM " + name + ": " + s + " (" + s.length() + " chars)"); } public static void main(String[] args) { print_bigdecimal("-2^-1075==", new BigDecimal(-1).divide (new BigDecimal(2).pow(1075))); print_bigdecimal("0x80...01L", new BigDecimal(Double.longBitsToDouble(0x8000000000000001L))); } }
2^-1075 is valid because it's 2^-52 + 2^-1023
What you are seeing is a bad printing algorithm. One would use the Grisu3 algorithm for this edge case.
@user2356685: "Edge case"? The (albeit) very small value is a valid "small value" - nothing more, nothing less, The tempreal 10-byte real type allows even smaller values. The gmplib libraries smaller still.
G
Greg A. Woods

The maximum number of characters that will be required to print any decimal double value (i.e. in "%f" format) will be for the value of -DBL_MIN (i.e. -0x1p-1022, assuming binary64 IEEE 754 is your double). For that you'll need exactly 325 characters. That's: DBL_DIG + abs(DBL_MIN_10_EXP) + strlen("-0."). This is of course because log10(fabs(DBL_MIN)) is 308, which is also abs(DBL_MIN_10_EXP)+1 (the +1 is because of the leading digit to the left of the decimal place), and that's the number of leading zeros to the left of the significant digits.

int lz;                 /* aka abs(DBL_MIN_10_EXP)+1 */
int dplaces;
int sigdig;             /* aka DBL_DECIMAL_DIG - 1 */
double dbl = -DBL_MIN;

lz = abs((int) lrint(floor(log10(fabs(dbl)))));
sigdig = lrint(ceil(DBL_MANT_DIG * log10((double) FLT_RADIX)));
dplaces = sigdig + lz - 1;
printf("f = %.*f\n", dplaces, dbl);

My output is "f = -0.00000000(300 more zeros) 000000000000000" and printf("lz = %d, sigdig = %d, dplaces = %d, dbl = %g\n", lz, sigdig, dplaces, dbl); prints "lz = 308, sigdig = 16, dplaces = 323, dbl = -2.22507e-308". What output do you receive?
lrint() call provides little value versus just (int). Removing it would add clarity.
The sigdig formula is off by 1 with FLT_RADIX == 2. See definition of DBL_DECIMAL_DIG which is effectively what sigdig should be to determine significant digits for all double. Suggest sigdig = DBL_DECIMAL_DIG; (e.g. 17)
Note, DBL_MIN is the smallest normal. DBL_TRUE_MIN is the smallest non-zero double. Of course with printing significant digits, using DBL_MIN will suffice - as this answer does.
As I said in the comment sigdig is DBL_DECIMAL_DIG - 1
Z
Zoe stands with Ukraine

"For that you'll need exactly 325 characters"

Apparently (and this is a very common case) You don't understand the how the conversion between different numeric bases works.

No matter how accurate the definition of the DBL_MIN is, it is limited by hardware accuracy, which is usually up to 80 bits or 18 decimal digits (x86 and similar architectures)

For that reason, specialized arbitrary-precision-arithmetic libraries has been invented, like f.e. gmp or mpfr.


J
John Cummings

As an improvement on the accepted answer based on Greg A. Woods accurate comment, a more conservative but still adequate number of characters needed is 3 + DBL_DIG + -DBL_MIN_10_EXP (total 325) with 3 being for the leading "-0." that may be needed. If using C-style strings, add one for null ('\0') termination such that an adequately sized buffer (of size 326) may be created with:

#include <limits.h>

char buffer[4 + DBL_DIG + -DBL_MIN_10_EXP];

For those who prefer the C++ numeric limits interface that would be:

#include <limits>

char buffer[4 + std::numeric_limits<double>::digits10 + -std::numeric_limits<double>::min_exponent10];