ChatGPT解决这个技术问题 Extra ChatGPT

How to efficiently concatenate strings in go

In Go, a string is a primitive type, which means it is read-only, and every manipulation of it will create a new string.

So if I want to concatenate strings many times without knowing the length of the resulting string, what's the best way to do it?

The naive way would be:

var s string
for i := 0; i < 1000; i++ {
    s += getShortStringFromSomewhere()
}
return s

but that does not seem very efficient.

One more bench
Note: This question and most answers seem to have been written before append() came into the language, which is a good solution for this. It will perform fast like copy() but will grow the slice first even if that means allocating a new backing array if the capacity isn't enough. bytes.Buffer still makes sense if you want its additional convenience methods or if the package you're using expects it.
It doesn't just "seem very inefficient"; it has a specific problem that every new non-CS hire we have ever gotten runs into in the first few weeks on the job. It's quadratic - O(n*n). Think of the number sequence: 1 + 2 + 3 + 4 + .... It's n*(n+1)/2, the area of a triangle of base n. You allocate size 1, then size 2, then size 3, etc when you append immutable strings in a loop. This quadratic resource consumption manifests itself in more ways than just this.

I
Inanc Gumus

New Way:

From Go 1.10 there is a strings.Builder type, please take a look at this answer for more detail.

Old Way:

Use the bytes package. It has a Buffer type which implements io.Writer.

package main

import (
    "bytes"
    "fmt"
)

func main() {
    var buffer bytes.Buffer

    for i := 0; i < 1000; i++ {
        buffer.WriteString("a")
    }

    fmt.Println(buffer.String())
}

This does it in O(n) time.


instead of println(string(buffer.Bytes())); use could just do println(buffer.String())
Instead of buffer := bytes.NewBufferString(""), you can do var buffer bytes.Buffer. You also don't need any of those semicolons :).
Incredibly fast. Made some naive "+" string concat in my program go from 3 minutes to 1.3 seconds.
+1 for "O(n) time"; I think it's important to make more remarks like this.
Go 1.10 adds strings.Builder, which is like bytes.Buffer but faster when your end goal is a string.
I
Inanc Gumus

In Go 1.10+ there is strings.Builder, here.

A Builder is used to efficiently build a string using Write methods. It minimizes memory copying. The zero value is ready to use.

Example

It's almost the same with bytes.Buffer.

package main

import (
    "strings"
    "fmt"
)

func main() {
    // ZERO-VALUE:
    //
    // It's ready to use from the get-go.
    // You don't need to initialize it.
    var sb strings.Builder

    for i := 0; i < 1000; i++ {
        sb.WriteString("a")
    }

    fmt.Println(sb.String())
}

Click to see this on the playground.

Supported Interfaces

StringBuilder's methods are being implemented with the existing interfaces in mind. So that you can switch to the new Builder type easily in your code.

Grow(int) -> bytes.Buffer#Grow

Len() int -> bytes.Buffer#Len

Reset() -> bytes.Buffer#Reset

String() string -> fmt.Stringer

Write([]byte) (int, error) -> io.Writer

WriteByte(byte) error -> io.ByteWriter

WriteRune(rune) (int, error) -> bufio.Writer#WriteRune - bytes.Buffer#WriteRune

WriteString(string) (int, error) -> io.stringWriter

Differences from bytes.Buffer

It can only grow or reset.

It has a copyCheck mechanism built-in that prevents accidentially copying it: func (b *Builder) copyCheck() { ... }

In bytes.Buffer, one can access the underlying bytes like this: (*Buffer).Bytes(). strings.Builder prevents this problem. Sometimes, this is not a problem though and desired instead. For example: For the peeking behavior when the bytes are passed to an io.Reader etc.

strings.Builder prevents this problem.

Sometimes, this is not a problem though and desired instead.

For example: For the peeking behavior when the bytes are passed to an io.Reader etc.

bytes.Buffer.Reset() rewinds and reuses the underlying buffer whereas the strings.Builder.Reset() does not, it detaches the buffer.

Note

Do not copy a StringBuilder value as it caches the underlying data.

If you want to share a StringBuilder value, use a pointer to it.

Check out its source code for more details, here.


What do you mean by 'escape'? Do you mean escapes in the string, or just that the underlying bytes can be exposed?
@makhdumi Yes, 2nd, exposure of underlying bytes.
Worth noting strings.Builder implements its methods using a pointer receiver, which threw me for a moment. As a result, I would probably create one using new.
@DuncanJones I've added a note however, as it's used mostly for caching data, it's normal to use a pointer to it when sharing it across funcs etc. In the same func, you can use it as a non-pointer as well.
Another difference, which might be important: strings.Builder.Reset() sets the underling slice to nil (no memory reuse). Where bytes.Buffer.Reset() sets the []bytes to zero length, keeping the underlying array allocated. This bit me when reusing strings.Builder in a sync.Pool, which appeared to be completely useless.
I
Inanc Gumus

If you know the total length of the string that you're going to preallocate then the most efficient way to concatenate strings may be using the builtin function copy. If you don't know the total length before hand, do not use copy, and read the other answers instead.

In my tests, that approach is ~3x faster than using bytes.Buffer and much much faster (~12,000x) than using the operator +. Also, it uses less memory.

I've created a test case to prove this and here are the results:

BenchmarkConcat  1000000    64497 ns/op   502018 B/op   0 allocs/op
BenchmarkBuffer  100000000  15.5  ns/op   2 B/op        0 allocs/op
BenchmarkCopy    500000000  5.39  ns/op   0 B/op        0 allocs/op

Below is code for testing:

package main

import (
    "bytes"
    "strings"
    "testing"
)

func BenchmarkConcat(b *testing.B) {
    var str string
    for n := 0; n < b.N; n++ {
        str += "x"
    }
    b.StopTimer()

    if s := strings.Repeat("x", b.N); str != s {
        b.Errorf("unexpected result; got=%s, want=%s", str, s)
    }
}

func BenchmarkBuffer(b *testing.B) {
    var buffer bytes.Buffer
    for n := 0; n < b.N; n++ {
        buffer.WriteString("x")
    }
    b.StopTimer()

    if s := strings.Repeat("x", b.N); buffer.String() != s {
        b.Errorf("unexpected result; got=%s, want=%s", buffer.String(), s)
    }
}

func BenchmarkCopy(b *testing.B) {
    bs := make([]byte, b.N)
    bl := 0

    b.ResetTimer()
    for n := 0; n < b.N; n++ {
        bl += copy(bs[bl:], "x")
    }
    b.StopTimer()

    if s := strings.Repeat("x", b.N); string(bs) != s {
        b.Errorf("unexpected result; got=%s, want=%s", string(bs), s)
    }
}

// Go 1.10
func BenchmarkStringBuilder(b *testing.B) {
    var strBuilder strings.Builder

    b.ResetTimer()
    for n := 0; n < b.N; n++ {
        strBuilder.WriteString("x")
    }
    b.StopTimer()

    if s := strings.Repeat("x", b.N); strBuilder.String() != s {
        b.Errorf("unexpected result; got=%s, want=%s", strBuilder.String(), s)
    }
}

The bytes.Buffer should do basically the same as the copy (with some extra bookkeeping I guess) and the speed isn't that different. So I'd use that :). The difference being that the buffer starts with 0 bytes so it has to reallocate (this make it seem a little slower I guess). Easier to use, though.
buffer.Write (bytes) is 30% faster than buffer.WriteString. [useful if you can get the data as []byte]
Note that the benchmark results are distorted and are not authentic. Different benchmark functions will be called with different values of b.N, and so you're not comparing the execution time of the same task to be carried out (e.g. one function might append 1,000 strings, another one might append 10,000 which can make a big difference in the average time of 1 append, in BenchmarkConcat() for example). You should use the same append count in each case (certainly not b.N), and do all the concatenation inside the body of the for ranging to b.N (that is, 2 for loops embedded).
Additionally, the copy benchmark is skewed by explicitly ignoring the time that the allocation takes, which is included in the other benchmarks.
Additionally, the copy benchmark relies on knowing the length of resulting string.
I
Inanc Gumus

If you have a string slice that you want to efficiently convert to a string then you can use this approach. Otherwise, take a look at the other answers.

There is a library function in the strings package called Join: http://golang.org/pkg/strings/#Join

A look at the code of Join shows a similar approach to Append function Kinopiko wrote: https://golang.org/src/strings/strings.go#L420

Usage:

import (
    "fmt";
    "strings";
)

func main() {
    s := []string{"this", "is", "a", "joined", "string\n"};
    fmt.Printf(strings.Join(s, " "));
}

$ ./test.bin
this is a joined string

Doesn't work when you have to loop over something that isn't a []string.
P
Pika Supports Ukraine

I just benchmarked the top answer posted above in my own code (a recursive tree walk) and the simple concat operator is actually faster than the BufferString.

func (r *record) String() string {
    buffer := bytes.NewBufferString("");
    fmt.Fprint(buffer,"(",r.name,"[")
    for i := 0; i < len(r.subs); i++ {
        fmt.Fprint(buffer,"\t",r.subs[i])
    }
    fmt.Fprint(buffer,"]",r.size,")\n")
    return buffer.String()
}

This took 0.81 seconds, whereas the following code:

func (r *record) String() string {
    s := "(\"" + r.name + "\" ["
    for i := 0; i < len(r.subs); i++ {
        s += r.subs[i].String()
    }
    s += "] " + strconv.FormatInt(r.size,10) + ")\n"
    return s
} 

only took 0.61 seconds. This is probably due to the overhead of creating the new BufferString.

Update: I also benchmarked the join function and it ran in 0.54 seconds.

func (r *record) String() string {
    var parts []string
    parts = append(parts, "(\"", r.name, "\" [" )
    for i := 0; i < len(r.subs); i++ {
        parts = append(parts, r.subs[i].String())
    }
    parts = append(parts, strconv.FormatInt(r.size,10), ")\n")
    return strings.Join(parts,"")
}

I believe the OP was more concerned about memory complexity rather than runtime complexity, given the fact that naive string concatenations result in new memory allocations each time.
The slow speed of this might well be related to using fmt.Fprint instead of buffer.WriteString("\t"); buffer.WriteString(subs[i]);
I am glad to know that my preferred method of (strings.Join) run as the fastest while from this saying that (bytes.Buffer) is the winner!
s
slfan
package main

import (
  "fmt"
)

func main() {
    var str1 = "string1"
    var str2 = "string2"
    out := fmt.Sprintf("%s %s ",str1, str2)
    fmt.Println(out)
}

Welcome to Stack Overflow! Take a moment to read through the editing help in the help center. Formatting on Stack Overflow is different than other sites.
While this code snippet may solve the question, including an explanation really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion. Please also try not to crowd your code with explanatory comments, this reduces the readability of both the code and the explanations!
This does not answer the question at all. fmt.Sprintf is the worst method in efficiency when concatinating simple strings. According to this bench, fmt.Sprintf turns out to be even slower than the add operator (+) OP mentioned very inefficient.
佚名

You could create a big slice of bytes and copy the bytes of the short strings into it using string slices. There is a function given in "Effective Go":

func Append(slice, data[]byte) []byte {
    l := len(slice);
    if l + len(data) > cap(slice) { // reallocate
        // Allocate double what's needed, for future growth.
        newSlice := make([]byte, (l+len(data))*2);
        // Copy data (could use bytes.Copy()).
        for i, c := range slice {
            newSlice[i] = c
        }
        slice = newSlice;
    }
    slice = slice[0:l+len(data)];
    for i, c := range data {
        slice[l+i] = c
    }
    return slice;
}

Then when the operations are finished, use string ( ) on the big slice of bytes to convert it into a string again.


It's interesting that there are so many ways to do this in Go.
In effective go, it also says that the idea is so useful it was captured in a builtin. So you can replace your function with append(slice, byte...), it seems.
r
rog

This is the fastest solution that does not require you to know or calculate the overall buffer size first:

var data []byte
for i := 0; i < 1000; i++ {
    data = append(data, getShortStringFromSomewhere()...)
}
return string(data)

By my benchmark, it's 20% slower than the copy solution (8.1ns per append rather than 6.72ns) but still 55% faster than using bytes.Buffer.


I
Inanc Gumus

Note added in 2018

From Go 1.10 there is a strings.Builder type, please take a look at this answer for more detail.

Pre-201x answer

The benchmark code of @cd1 and other answers are wrong. b.N is not supposed to be set in benchmark function. It's set by the go test tool dynamically to determine if the execution time of the test is stable.

A benchmark function should run the same test b.N times and the test inside the loop should be the same for each iteration. So I fix it by adding an inner loop. I also add benchmarks for some other solutions:

package main

import (
    "bytes"
    "strings"
    "testing"
)

const (
    sss = "xfoasneobfasieongasbg"
    cnt = 10000
)

var (
    bbb      = []byte(sss)
    expected = strings.Repeat(sss, cnt)
)

func BenchmarkCopyPreAllocate(b *testing.B) {
    var result string
    for n := 0; n < b.N; n++ {
        bs := make([]byte, cnt*len(sss))
        bl := 0
        for i := 0; i < cnt; i++ {
            bl += copy(bs[bl:], sss)
        }
        result = string(bs)
    }
    b.StopTimer()
    if result != expected {
        b.Errorf("unexpected result; got=%s, want=%s", string(result), expected)
    }
}

func BenchmarkAppendPreAllocate(b *testing.B) {
    var result string
    for n := 0; n < b.N; n++ {
        data := make([]byte, 0, cnt*len(sss))
        for i := 0; i < cnt; i++ {
            data = append(data, sss...)
        }
        result = string(data)
    }
    b.StopTimer()
    if result != expected {
        b.Errorf("unexpected result; got=%s, want=%s", string(result), expected)
    }
}

func BenchmarkBufferPreAllocate(b *testing.B) {
    var result string
    for n := 0; n < b.N; n++ {
        buf := bytes.NewBuffer(make([]byte, 0, cnt*len(sss)))
        for i := 0; i < cnt; i++ {
            buf.WriteString(sss)
        }
        result = buf.String()
    }
    b.StopTimer()
    if result != expected {
        b.Errorf("unexpected result; got=%s, want=%s", string(result), expected)
    }
}

func BenchmarkCopy(b *testing.B) {
    var result string
    for n := 0; n < b.N; n++ {
        data := make([]byte, 0, 64) // same size as bootstrap array of bytes.Buffer
        for i := 0; i < cnt; i++ {
            off := len(data)
            if off+len(sss) > cap(data) {
                temp := make([]byte, 2*cap(data)+len(sss))
                copy(temp, data)
                data = temp
            }
            data = data[0 : off+len(sss)]
            copy(data[off:], sss)
        }
        result = string(data)
    }
    b.StopTimer()
    if result != expected {
        b.Errorf("unexpected result; got=%s, want=%s", string(result), expected)
    }
}

func BenchmarkAppend(b *testing.B) {
    var result string
    for n := 0; n < b.N; n++ {
        data := make([]byte, 0, 64)
        for i := 0; i < cnt; i++ {
            data = append(data, sss...)
        }
        result = string(data)
    }
    b.StopTimer()
    if result != expected {
        b.Errorf("unexpected result; got=%s, want=%s", string(result), expected)
    }
}

func BenchmarkBufferWrite(b *testing.B) {
    var result string
    for n := 0; n < b.N; n++ {
        var buf bytes.Buffer
        for i := 0; i < cnt; i++ {
            buf.Write(bbb)
        }
        result = buf.String()
    }
    b.StopTimer()
    if result != expected {
        b.Errorf("unexpected result; got=%s, want=%s", string(result), expected)
    }
}

func BenchmarkBufferWriteString(b *testing.B) {
    var result string
    for n := 0; n < b.N; n++ {
        var buf bytes.Buffer
        for i := 0; i < cnt; i++ {
            buf.WriteString(sss)
        }
        result = buf.String()
    }
    b.StopTimer()
    if result != expected {
        b.Errorf("unexpected result; got=%s, want=%s", string(result), expected)
    }
}

func BenchmarkConcat(b *testing.B) {
    var result string
    for n := 0; n < b.N; n++ {
        var str string
        for i := 0; i < cnt; i++ {
            str += sss
        }
        result = str
    }
    b.StopTimer()
    if result != expected {
        b.Errorf("unexpected result; got=%s, want=%s", string(result), expected)
    }
}

Environment is OS X 10.11.6, 2.2 GHz Intel Core i7

Test results:

BenchmarkCopyPreAllocate-8         20000             84208 ns/op          425984 B/op          2 allocs/op
BenchmarkAppendPreAllocate-8       10000            102859 ns/op          425984 B/op          2 allocs/op
BenchmarkBufferPreAllocate-8       10000            166407 ns/op          426096 B/op          3 allocs/op
BenchmarkCopy-8                    10000            160923 ns/op          933152 B/op         13 allocs/op
BenchmarkAppend-8                  10000            175508 ns/op         1332096 B/op         24 allocs/op
BenchmarkBufferWrite-8             10000            239886 ns/op          933266 B/op         14 allocs/op
BenchmarkBufferWriteString-8       10000            236432 ns/op          933266 B/op         14 allocs/op
BenchmarkConcat-8                     10         105603419 ns/op        1086685168 B/op    10000 allocs/op

Conclusion:

CopyPreAllocate is the fastest way; AppendPreAllocate is pretty close to No.1, but it's easier to write the code. Concat has really bad performance both for speed and memory usage. Don't use it. Buffer#Write and Buffer#WriteString are basically the same in speed, contrary to what @Dani-Br said in the comment. Considering string is indeed []byte in Go, it makes sense. bytes.Buffer basically use the same solution as Copy with extra book keeping and other stuff. Copy and Append use a bootstrap size of 64, the same as bytes.Buffer Append use more memory and allocs, I think it's related to the grow algorithm it use. It's not growing memory as fast as bytes.Buffer

Suggestion:

For simple task such as what OP wants, I would use Append or AppendPreAllocate. It's fast enough and easy to use. If need to read and write the buffer at the same time, use bytes.Buffer of course. That's what it's designed for.


P
Peter Buchmann

My original suggestion was

s12 := fmt.Sprint(s1,s2)

But above answer using bytes.Buffer - WriteString() is the most efficient way.

My initial suggestion uses reflection and a type switch. See (p *pp) doPrint and (p *pp) printArg
There is no universal Stringer() interface for basic types, as I had naively thought.

At least though, Sprint() internally uses a bytes.Buffer. Thus

`s12 := fmt.Sprint(s1,s2,s3,s4,...,s1000)`

is acceptable in terms of memory allocations.

=> Sprint() concatenation can be used for quick debug output. => Otherwise use bytes.Buffer ... WriteString


It's not built in and it's not efficient.
Importing a package (like fmt) means it's not builtin. It's in the standard library.
It's slow only because it uses reflection on it's arguments. It's efficent. Otherwise it's not less efficient than joining with strings.Join
P
Peter Buchmann

Expanding on cd1's answer: You might use append() instead of copy(). append() makes ever bigger advance provisions, costing a little more memory, but saving time. I added two more benchmarks at the top of yours. Run locally with

go test -bench=. -benchtime=100ms

On my thinkpad T400s it yields:

BenchmarkAppendEmpty    50000000         5.0 ns/op
BenchmarkAppendPrealloc 50000000         3.5 ns/op
BenchmarkCopy           20000000        10.2 ns/op

V
Vitaly Isaev

This is actual version of benchmark provided by @cd1 (Go 1.8, linux x86_64) with the fixes of bugs mentioned by @icza and @PickBoy.

Bytes.Buffer is only 7 times faster than direct string concatenation via + operator.

package performance_test

import (
    "bytes"
    "fmt"
    "testing"
)

const (
    concatSteps = 100
)

func BenchmarkConcat(b *testing.B) {
    for n := 0; n < b.N; n++ {
        var str string
        for i := 0; i < concatSteps; i++ {
            str += "x"
        }
    }
}

func BenchmarkBuffer(b *testing.B) {
    for n := 0; n < b.N; n++ {
        var buffer bytes.Buffer
        for i := 0; i < concatSteps; i++ {
            buffer.WriteString("x")
        }
    }
}

Timings:

BenchmarkConcat-4                             300000          6869 ns/op
BenchmarkBuffer-4                            1000000          1186 ns/op

I don't think manually setting b.N is the right way to use benchmark functions of testing package
@PickBoy, please justify your point of view. Why do you think b.N is a public variable?
b.N is not supposed to be set in benchmark function. It's set by the go test tool dynamically. A benchmark function should run the same test b.N times, but in your code(as well as @cd1 's code), every test in the loop is a different test (because the length of the string is growing)
@PickBoy, if you let the go test tool set b.N dynamically, you'll wind up with a strings of a different length in different test-cases. See comment
That's why you should add an inner loop of a fixed number of iterations, like 10000, inside the b.N loop.
X
Xian Shu

goutils.JoinBetween

 func JoinBetween(in []string, separator string, startIndex, endIndex int) string {
    if in == nil {
        return ""
    }

    noOfItems := endIndex - startIndex

    if noOfItems <= 0 {
        return EMPTY
    }

    var builder strings.Builder

    for i := startIndex; i < endIndex; i++ {
        if i > startIndex {
            builder.WriteString(separator)
        }
        builder.WriteString(in[i])
    }
    return builder.String()
}

K
Krish Bhanushali

I do it using the following :-

package main

import (
    "fmt"
    "strings"
)

func main (){
    concatenation:= strings.Join([]string{"a","b","c"},"") //where second parameter is a separator. 
    fmt.Println(concatenation) //abc
}

This doesn't address OP's issue of building a string through a series of iterations, with a for loop.
r
rajni kant
package main

import (
"fmt"
)

func main() {
    var str1 = "string1"
    var str2 = "string2"
    result := make([]byte, 0)
    result = append(result, []byte(str1)...)
    result = append(result, []byte(str2)...)
    result = append(result, []byte(str1)...)
    result = append(result, []byte(str2)...)

    fmt.Println(string(result))
}

Please do not post code only answers. Please give an explanation what this code does and why it is the solution.
S
Sumer

Simple and easy to digest solution. Details in the comments. Copy overwrites the elements of slice. We are slicing single-single element and overwriting it.

package main

import (
    "fmt"
)

var N int = 100000

func main() {
    slice1 := make([]rune, N, N)
    //Efficient with fast performance, Need pre-allocated memory
    //We can add a check if we reached the limit then increase capacity
    //using append, but would be fined for data copying to new array. Also append happens after the length of current slice.
    for i := 0; i < N; i++ {
        copy(slice1[i:i+1], []rune{'N'})
    }
    fmt.Println(slice1)

    //Simple but fast solution, Every time the slice capacity is reached we get a fine of effort that goes
    //in copying data to new array
    slice2 := []rune{}
    for i := 0; i <= N; i++ {
        slice2 = append(slice2, 'N')
    }
    fmt.Println(slice2)

}

h
hechen0

benchmark result with memory allocation statistics. check benchmark code at github.

use strings.Builder to optimize performance.

go test -bench . -benchmem
goos: darwin
goarch: amd64
pkg: github.com/hechen0/goexp/exps
BenchmarkConcat-8                1000000             60213 ns/op          503992 B/op          1 allocs/op
BenchmarkBuffer-8               100000000               11.3 ns/op             2 B/op          0 allocs/op
BenchmarkCopy-8                 300000000                4.76 ns/op            0 B/op          0 allocs/op
BenchmarkStringBuilder-8        1000000000               4.14 ns/op            6 B/op          0 allocs/op
PASS
ok      github.com/hechen0/goexp/exps   70.071s

please give credit to @cd1 for the original test-cases you are building on here.
u
user2288856
s := fmt.Sprintf("%s%s", []byte(s1), []byte(s2))

This is solution very slow, because it uses reflection, it parses the format string, and it makes a copy of the data for the []byte(s1) conversion. Comparing it with other solutions posted, can you name a single advantage of your solution?
A
Anshu

strings.Join() from the "strings" package

If you have a type mismatch(like if you are trying to join an int and a string), you do RANDOMTYPE (thing you want to change)

EX:

package main

import (
    "fmt"
    "strings"
)

var intEX = 0
var stringEX = "hello all you "
var stringEX2 = "people in here"


func main() {
    s := []string{stringEX, stringEX2}
    fmt.Println(strings.Join(s, ""))
}

Output :

hello all you people in here

This code doesn't even compile: strings.Join() takes only 2 parameters: a slice and a separator string.
this can't help
add some changes here.