Delphi Clinic | C++Builder Gate | Training & Consultancy | Delphi Notes Weblog | Dr.Bob's Webshop |
|
As a long-term Turbo Pascal user, I've known Strings to be limited to 255 characters only, with the 0th 'character' being the actual length-Byte (range 0..255). This lead to dirty-but-fast code such as the following to trim the spaces on the right side of the String:
procedure RightTrim16(var Str: String); var Len: Byte absolute Str; { Len is overlaid with the 0th length byte of Str } begin while Str[Len] = ' ' do Dec(Len) end {RightTrim16};This code still works for the 16-bit version of Delphi. But with the release of Delphi 2.0, Borland has provided us with a new implementation of Strings: Long Strings, where all we've known and hacked before has become illegal.
A Long String is just what is says: a long String, of up to 2Gb characters in size (in practice a long String is limited by the amount of available memory on your machine). Since we now need a LongInt to store the size of the long String, the 0th character (or byte) is no longer sufficient to indicate the Length of the string. And hence, all legacy code that reads, writes and uses this 0th byte has become a severe cause for Access Violations. Using the old memory layout, we could state that the Length LongInt (4 bytes) now starts at offset -3. But that's not all. A long String also has a Reference Count LongInt at position -7, and a Allocation Size LongInt at position -11. Long String constants do not have the allocation size LongInt, but only the first two.
Length
Now that we know we cannot use the 0th Byte of a string as length indicator, how can we remove the trailing spaces from a long String? Well, we can still use the Length function to get the length of a String.
We only need to realise that it's a 32-bit integer now.
To remove trailing spaces from a (long) String, we now need to walk back from the end of the long String and count the number of spaces that are at the end.
After having counted them, we can just Delete them from the String (you don't want to Delete the spaces one at a time, since this could involve copying the long String for every Delete operation, something you'd like to avoid as much as possible for performance's sake).
procedure RightTrim32(var Str: String); var SPos,SLen: Integer; begin Spos := Length(Str); if Str[SPos] = ' ' then { are there trailing spaces to begin with? } begin SLen := 0; while (SPos > 0) and (Str[SPos] = ' ') do begin Inc(SLen); Dec(SPos) end; Delete(Str,SPos+1,SLen); { SetLength(Str,SPos); } end end {RightTrim32};Note the commented call to SetLength. SetLength is used to set the length of a (long) String. Note that the length is something different than the allocated size, so the length of a String should always be shorter or equal to the allocated size. Fortunately, SetLength also makes sure that enough memory is allocated for the long String including a terminating #0 character (to ensure compatibility with PChars - read the ObjectPascal Language Guide page 22 for more about this). For short strings, SetLength only sets the length byte, and the length should be between 0 and 255 characters. So, if you want to prepare your 16-bits Delphi code for the use of SetLength, you can add the following code to your projects right now:
{$IFNDEF WIN32} { to make sure it doesn't get added into Delphi 2 code } procedure SetLength(var Str: String; Len: Integer); begin Str[0] := Chr(Len); end {SetLength}; {$ENDIF}Actually, we might be able to get away with using Delete in our RightTrim routine, by using SetLength instead (just set the length of the string to the new value, like the previous 16-bits hack). This leads to the following 32-bit RightTrim procedure (that can also be used with Delphi 1.0 as long as you include the procedure SetLength from above):
procedure RightTrim(var Str: String); var SPos: Integer; begin Spos := Length(Str); if Str[SPos] = ' ' then { are there trailing spaces to begin with? } begin while (SPos > 0) and (Str[SPos] = ' ') do Dec(SPos); SetLength(Str,SPos); end end {RightTrim};This version suddenly looks a lot like the 16-bit fast version. And why shouldn't it? After all, we still use the same algorithm. If we compare this procedure with the TrimRight function Borland provides, we'll see that they use a similar algorithm, but one that copies the resulting string.
function TrimRight(const S: string): string; { TrimRight trims trailing spaces and control characters from the given string. } var I: Integer; begin I := Length(S); while (I > 0) and (S[I] <= ' ') do Dec(I); Result := Copy(S, 1, I); end;Copying a long String would seem to take longer than setting the length of one, right? Well, this is a bit of a paradox actually, and that's where Reference Counters come in...
Reference Count
When a long String is copied, the reference count of the original is incremented and only a pointer is copied.
This takes very little time, which explains why even copying long Strings of several megabytes long takes no time at all.
Of course, once you change the copy (assign something new to the 12,345,678th element of a 20Mb long String) then the actual memory for a new long String must be allocated to hold the 20Mb long sting copy with the new 12,345,678th element.
This is referred to as 'copy-on-write', but I would rather call it a 'delayed-performance-hit', since that's what your users will be experiencing.
There are ways around it, such as using the UniqueString function, but that's a story for another day...
To make a Long String Short...
Finally something to take care of when writing or calling procedures that have var parameters of type string, and local variables.
I sometimes get reports of people having Access Violations after using the copy procedure to assign data to one of the var parameters.
It seems that Delphi thinks these parameters are huge strings even if you set this option to off ($H- or the Compiler | Options box).
This leads to my final recommendation for this month: except when you're writing general string routines, always explicitly state whether you're using a long or a short string!
Don't depend on the {$H} compiler directive (which works on a unit basis), but use the ShortString and AnsiString predefined types instead. The compiler may use a String as a Long String while you think it's a Short String or vice versa, but when you use ShortString or AnsiString types this will never happen.
unit Hood2; interface {$IFNDEF WIN32} procedure RightTrim16(var Str: String); procedure SetLength(var Str: String; Len: Integer); {$ELSE} procedure RightTrim32(var Str: String); {$ENDIF} procedure RightTrim(var Str: String); implementation {$IFNDEF WIN32} { to make sure it doesn't get added into Delphi 2 code } procedure RightTrim16(var Str: String); var Len: Byte absolute Str; { Len is overlaid with the 0th length byte of Str} begin while Str[Len] = ' ' do Dec(Len) end {RightTrim16}; procedure SetLength(var Str: String; Len: Integer); begin Str[0] := Chr(Len); end {SetLength}; {$ELSE} procedure RightTrim32(var Str: String); var SPos,SLen: Integer; begin Spos := Length(Str); if Str[SPos] = ' ' then { are there trailing spaces to begin with? } begin SLen := 0; while (SPos > 0) and (Str[SPos] = ' ') do begin Inc(SLen); Dec(SPos) end; Delete(Str,SPos,SLen); { SetLength(Str,SPos); } end end {RightTrim32}; {$ENDIF} procedure RightTrim(var Str: String); var SPos: Integer; begin Spos := Length(Str); if Str[SPos] = ' ' then { are there trailing spaces to begin with? } begin while (SPos > 0) and (Str[SPos] = ' ') do Dec(SPos); SetLength(Str,SPos); end end {RightTrim}; end.