Delphi Clinic | C++Builder Gate | Training & Consultancy | Delphi Notes Weblog | Dr.Bob's Webshop |
|
As we saw in a previous Under The Hood column, Delphi 2 has a new Long String type, which offers reference counted strings of up to 2Gb in size. Perfect, right? Well, let's investigate some unexpected potential efficiency problems when using Long Strings in your applications...
Reference Count
Delphi 2 Long Strings are reference counted. And you'd need such a mechanism if you'd have to store multiple copies of the same long string (which each can be up to 2Gb) without too much overhead. Any long string that gets created, gets allocated on the heap and has an initial reference count of one (1). And if you make a copy (either explicitly, or by passing it as value argument to a procedure or function), the reference counter is increased. If the original or copy goes out of scope, then the reference counter is decreased. This goes on until the reference counter is zero (0), at which time the entire Long String is deallocated again.
Copy for Free
Well, this leads to the following conclusion: copying a long string in Delphi 2 is essentially free, since all you do is have a new pointer to the (same) long string and increase the reference counter. This is a big advantage over short strings, where copying a short string could be a huge performance problem (like passing a value short string argument to a short string function, where both the argument and function result needs to be copied - a lengthy and time costly process).
But what if we modify one of the copies of the long string? Won't we modify any other copies as well? Well, at that particular point in execution, Delphi will notice that the long string we're about to modify has a reference count greater than 1. So, it needs to make sure the long string to be modified gets its own "version" of the long string (with reference count of 1). This also results in the original long string having a reference counter decreased by one (since the modified copy is no longer a copy of it).
Making an actual "deep" copy of a long string takes time, of course. Even more time that simply copying a short string, since a long string often needs more work (it can be a lot longer, right?). This is what I've called a delayed performance hit.
Delayed Performance Hit
A delayed performance hit is what you see when you modify a single character in a long string, and get an unexpected high delay, because the entire (copy of the) long string needs to be re-allocated to make the modification. To illustrate this point, I've written a little program that creates a long string (of about 360Kb) and making 13 copies of it in absolutely no time flat. Then, I modify one character in each of these copies, which takes about half a second on a P133, 32Mb machine. Why? Because 13 actual copies of a 360Kb long string have to be made. Almost 5 Mb gets allocated just to make 13 character changes. And that's a delayed performance hit, and you need to be aware of that when playing with long strings in your applications.
Don't get byte!
program Hood5; {$APPTYPE CONSOLE} uses MMSystem; const copies = 13; var start: LongInt; var Str: String; Copy: Array[1..copies] of String; modpos,i: Integer; begin Str := 'this is only one line of a multi-line string'; for i:=1 to copies do Str := Str + Str; { grow 2^copies times! } writeln(Length(Str)); writeln; write('copy: '); start := timeGetTime; for i:=1 to copies do Copy[i] := Str; writeln(timeGetTime-start); writeln; modpos := Length(Str) div 2; write('modify: '); start := timeGetTime; for i:=1 to copies do Copy[i,modpos] := '@'; writeln(timeGetTime-start); end.