Why is StringBuilder faster in string concatenations?
Almost every developer who is new to development using C# faces a question as to which one is better - string.Concat, + (plus sign), string.Format or StringBuilder for performing string concatenation. The most easiest way to find a correct an answer is to Google and get views of many experts. Few of the hot links which every developer stumbles upon are
- How to: Concatenate Multiple Strings (C# Programming Guide) - Direct from MSDN/Microsoft
- What's the best string concatenation method using C#? – On StackOverflow
I don’t want to iterate what’s mentioned in the above articles, so I’ll just give a gist (from: MSDN)
The performance of a concatenation operation for a String or StringBuilder object depends on how often a memory allocation occurs. A String concatenation operation always allocates memory, whereas a StringBuilder concatenation operation only allocates memory if the StringBuilder object buffer is too small to accommodate the new data. Consequently, the String class is preferable for a concatenation operation if a fixed number of String objects are concatenated. In that case, the individual concatenation operations might even be combined into a single operation by the compiler. A StringBuilder object is preferable for a concatenation operation if an arbitrary number of strings are concatenated; for example, if a loop concatenates a random number of strings of user input.
So that makes few important conclusions
- String is immutable, hence every time we use it (either its object, or any of its methods) it internally allocates a new memory location and stores the new value in the memory location. When we perform repeated modifications, using a string object is an overhead
- When we have a finite number of text concatenations, we could use either of the following formats,
+ "with a lot of words"
+ "together forming a sentence. This is used in demo"
+ "of string concatenation.";
string finalStringUsingStringConcat =
String.Concat(new[] {
@"this is a new string"
, "with a lot of words"
, "together forming a sentence. This is used in demo"
, "of string concatenation."
});
- For concatenations in a loop (where count > 2), prefer a StringBuilder. Now that’s what is a known fact. Let’s see why and how it is so.
Step-Into StringBuilder class
When an object of StringBuilder is created, either with a default string value or with a default constructor, a char buffer (read: array) of capacity 0x10 or length of string passed in constructor whichever greater is created internally. This buffer has a maximum capacity of 0x7fffffff unless specified explicitly by you while constructing an object of StringBuilder.
If a string value has been assigned in the constructor, it copies the characters of string in the memory using wstrcpy (internal) method of System.String. Now when you call the method Append(string) in your code the code snippet below gets executed.
- if (value != null)
- {
- char[] chunkChars = this.m_ChunkChars;
- int chunkLength = this.m_ChunkLength;
- int length = value.Length;
- int num3 = chunkLength + length;
- if (num3 < chunkChars.Length)
- {
- if (length <= 2)
- {
- if (length > 0)
- {
- chunkChars[chunkLength] = value[0];
- }
- if (length > 1)
- {
- chunkChars[chunkLength + 1] = value[1];
- }
- }
- else
- {
- fixed (char* str = ((char*)value))
- {
- char* smem = str;
- fixed (char* chRef = &(chunkChars[chunkLength]))
- {
- string.wstrcpy(chRef, smem, length);
- }
- }
- }
- this.m_ChunkLength = num3;
- }
- else
- {
- this.AppendHelper(value);
- }
- }
Check for Line 9 where it checks if length <= 2 then assign the first two characters of the string manually in the character array (the buffer). Otherwise, as line 22-29 suggest, it first fixes the location of a pointer variable (to understand better, read fixed keyword) so that the GC does not relocate it and then copies the characters of the string using wstrcpy (which is an internal method of System.String). So performance and strategy of StringBuilder primarily relies on the method wstrcpy. The core code of wstrcpy deals with using integer pointers to copy from source (the object passed in the Append method, whose location is referred as smem) to the destination (the character buffer, whose destination is referred as dmem)
- while (charCount >= 8)
- {
- *((int*)dmem) = *((uint*)smem);
- *((int*)(dmem + 2)) = *((uint*)(smem + 2));
- *((int*)(dmem + 4)) = *((uint*)(smem + 4));
- *((int*)(dmem + 6)) = *((uint*)(smem + 6));
- dmem += 8;
- smem += 8;
- charCount -= 8;
- }
String.Format is another StringBuilder
Yes, String.Format internally uses StringBuilder and creates a buffer of size format.Length + (args.Length * 8).
- public static string Format(IFormatProvider provider, string format, params object[] args)
- {
- if ((format == null) || (args == null))
- {
- throw new ArgumentNullException((format == null) ? "format" : "args");
- }
- StringBuilder sb = StringBuilderCache.Acquire(format.Length + (args.Length * 8));
- sb.AppendFormat(provider, format, args);
- return StringBuilderCache.GetStringAndRelease(sb);
- }
This has two advantages over using a plain-vanilla StringBuilder.
- It creates a buffer of bigger size than by a default 0x10 size
- It uses StringBuilderCache class that maintains a copy of StringBuilder as a static variable. When Acquire method is invoked, it clears up the cache value (but does not create a new object) and returns the object of StringBuilder. This reduces the time required to create an object of StringBuilder
So my preference of usage for repeated concatenations would be to try using String.Format followed by StringBuilder, then String.Concat or + (plus, operator overload)
Performance check
I did a small performance check to verify our understanding and the results when 100,000 concatenations were performed in a loop on a quad processor machine were
Time taken using + : 93071.0034
Time taken using StringBuilder: 14.0182
Time taken using StringBuilder with Format: 24.0155
Time taken using String.Format and + : 24.0155
Time taken using StringBuilder with Format and clear: 38.0249