Delphi Clinic C++Builder Gate Training & Consultancy Delphi Notes Weblog Dr.Bob's Webshop
Bob Swart (aka Drs.Bob) Dr.Bob's Delphi Clinics Dr.Bob's Delphi Courseware Manuals
View Bob Swart's profile on LinkedIn Drs.Bob's Delphi Notes
These are the voyages using Delphi Enterprise (and Architect). Its mission: to explore strange, new worlds. To design and build new applications. To boldly go...

Unicode tip #2 - UTF-16 Surrogate Characters

Author: Bob Swart
Posted: 11/23/2008 12:35:24 PM (GMT+1)

Although UTF-16 characters are simple WideChar characters, there are also surrogate characters (also WideChar items) that are used to produce a special Unicode character on a different plane than the default plane 0. Plane here refers to the high $10000 bit of the Unicode code point.

A nice example of a Unicode code point beyond $10000 is the G Clef, which is code point $1D11E and the UTF-16 surrogate pair $D834 and $DD1E. There are a number of ways to define the Clef inside a string.
As an example, let’s put the Clef between [ and ] square brackets. We can produce the two UTF-16 surrogate bytes by calling the ConvertFromUtf32($1D11E) function. But we can also declare a constant Clef by coding the opening and closing brackets as well as the two surrogate bytes themselves. For that, we need to calculate the two surrogate characters, which is done as follows: first subtract $10000, which leaves $D11E, which is 00001101000100011110 in 20 bits, split in $34 and $11E. $34 is added to $D800, and $11E is added to $DC00 resulting in $D834 for the most significant surrogate, and $DD1E for the least significant surrogate.

As a result, the definition for Clef between square brackets can be as follows:

  const       // surrogate bytes
Clef = #$5B + #$D834 + #$DD1E + #$5D;
This bring another interesting topic to the attention: the number of elements vs. the number of printable characters, which will be covered tomorrow.

This tip is the second in a series of Unicode tips taken from my Delphi 2009 Development Essentials book available shortly on


1 Comment

ali asghar 12/08/20 08:06:18error on this line "Incompatible Type"

New Comment (max. 2048 characters, no HTML):


This webpage © 2005-2014 by Bob Swart (aka Dr.Bob - All Rights Reserved.