Home Random Page


CATEGORIES:

BiologyChemistryConstructionCultureEcologyEconomyElectronicsFinanceGeographyHistoryInformaticsLawMathematicsMechanicsMedicineOtherPedagogyPhilosophyPhysicsPolicyPsychologySociologySportTourism






Nbsp;   The System.String Type

One of the most used types in any application is System.String. A String represents an im- mutable sequence of characters. The String type is derived immediately from Object, making it a reference type, and therefore, String objects (its array of characters) always live in the heap, never on a thread’s stack. The String type also implements several interfaces (IComparable/

IComparable<String>, ICloneable, IConvertible, IEnumerable/IEnumerable<Char>, and

IEquatable<String>).

 

Constructing Strings

Many programming languages (including C#) consider String to be a primitive type—that is, the compiler lets you express literal strings directly in your source code. The compiler places these literal strings in the module’s metadata, and they are then loaded and referenced at run time.

In C#, you can’t use the new operator to construct a String object from a literal string.

 

using System;

 

public static class Program { public static void Main() {

String s = new String("Hi there."); // <­­ Error Console.WriteLine(s);

}

}


Instead, you must use the following simplified syntax.

 

using System;

 

public static class Program { public static void Main() {

String s = "Hi there."; Console.WriteLine(s);

}

}

 

If you compile this code and examine its IL (using ILDasm.exe), you’d see the following.

 

.method public hidebysig static void Main() cil managed

{

.entrypoint

// Code size 13 (0xd)

.maxstack 1

.locals init ([0] string s) IL_0000: ldstr "Hi there." IL_0005: stloc.0

IL_0006: ldloc.0

IL_0007: call void [mscorlib]System.Console::WriteLine(string) IL_000c: ret

} // end of method Program::Main

 

The newobj IL instruction constructs a new instance of an object. However, no newobj instruction appears in the IL code example. Instead, you see the special ldstr (load string) IL instruction, which constructs a String object by using a literal string obtained from metadata. This shows you that

the common language runtime (CLR) does, in fact, have a special way of constructing literal String

objects.

 

If you are using unsafe code, you can construct a String object from a Char* or SByte*. To accomplish this, you would use C#’s new operator and call one of the constructors provided by the String type that takes Char* or SByte* parameters. These constructors create a String object, initializing the string from an array of Char instances or signed bytes. The other constructors don’t have any pointer parameters and can be called using safe (verifiable) code written in any managed programming language.

C# offers some special syntax to help you enter literal strings into the source code. For special characters such as new lines, carriage returns, and backspaces, C# uses the escape mechanism familiar to C/C++ developers.

 

// String containing carriage­return and newline characters String s = "Hi\r\nthere.";




 

You can concatenate several strings to form a single string by using C#’s + operator as follows.

 

// Three literal strings concatenated to form a single literal string String s = "Hi" + " " + "there.";

 

In this code, because all of the strings are literal strings, the C# compiler concatenates them at compile time and ends up placing just one string—"Hi there."—in the module’s metadata. Using the + operator on nonliteral strings causes the concatenation to be performed at run time. To con- catenate several strings together at run time, avoid using the + operator because it creates multiple string objects on the garbage-collected heap. Instead, use the System.Text.StringBuilder type (which I’ll explain later in this chapter).

Finally, C# also offers a special way to declare a string in which all characters between quotes are considered part of the string. These special declarations are called verbatim strings and are typically used when specifying the path of a file or directory or when working with regular expressions. Here is some code showing how to declare the same string with and without using the verbatim string character (@).

 

// Specifying the pathname of an application

String file = "C:\\Windows\\System32\\Notepad.exe";

 

// Specifying the pathname of an application by using a verbatim string String file = @"C:\Windows\System32\Notepad.exe";

 

You could use either one of the preceding code lines in a program because they produce identical strings in the assembly’s metadata. However, the @ symbol before the string on the second line tells the compiler that the string is a verbatim string. In effect, this tells the compiler to treat backslash characters as backslash characters instead of escape characters, making the path much more readable in your source code.

Now that you’ve seen how to construct a string, let’s talk about some of the operations you can perform on String objects.


Strings Are Immutable

The most important thing to know about a String object is that it is immutable. That is, once created, a string can never get longer, get shorter, or have any of its characters changed. Having im- mutable strings offers several benefits. First, it allows you to perform operations on a string without actually changing the string.

 

if (s.ToUpperInvariant().Substring(10, 21).EndsWith("EXE")) {

...

}

 

Here, ToUpperInvariant returns a new string; it doesn’t modify the characters of the string s. Substring operates on the string returned by ToUpperInvariant and also returns a new string, which is then examined by EndsWith. The two temporary strings created by ToUpperInvariant and Substring are not referenced for long by the application code, and the garbage collector will reclaim their memory at the next collection. If you perform a lot of string manipulations, you end up creating a lot of String objects on the heap, which causes more frequent garbage collections, thus hurting your application’s performance.

Having immutable strings also means that there are no thread synchronization issues when manipulating or accessing a string. In addition, it’s possible for the CLR to share multiple identi- cal String contents through a single String object. This can reduce the number of strings in the system—thereby conserving memory usage—and it is what string interning (discussed later in the chapter) is all about.

For performance reasons, the String type is tightly integrated with the CLR. Specifically, the CLR knows the exact layout of the fields defined within the String type, and the CLR accesses these fields directly. This performance and direct access come at a small development cost: the String class is sealed, which means that you cannot use it as a base class for your own type. If you were able to de- fine your own type, using String as a base type, you could add your own fields, which would break the CLR’s assumptions. In addition, you could break some assumptions that the CLR team has made about String objects being immutable.

 

Comparing Strings

Comparing is probably the most common operation performed on strings. There are two reasons to compare two strings with each other. We compare two strings to determine equality or to sort them (usually for presentation to a user).

In determining string equality or when comparing strings for sorting, it is highly recommended

that you call one of these methods (defined by the String class).

 

Boolean Equals(String value, StringComparison comparisonType)

static Boolean Equals(String a, String b, StringComparison comparisonType)

 

static Int32 Compare(String strA, String strB, StringComparison comparisonType)

static Int32 Compare(string strA, string strB, Boolean ignoreCase, CultureInfo culture) static Int32 Compare(String strA, String strB, CultureInfo culture, CompareOptions options)


static Int32 Compare(String strA, Int32 indexA, String strB, Int32 indexB, Int32 length, StringComparison comparisonType)

static Int32 Compare(String strA, Int32 indexA, String strB, Int32 indexB, Int32 length, CultureInfo culture, CompareOptions options)

static Int32 Compare(String strA, Int32 indexA, String strB, Int32 indexB, Int32 length, Boolean ignoreCase, CultureInfo culture)

 

Boolean StartsWith(String value, StringComparison comparisonType) Boolean StartsWith(String value,

Boolean ignoreCase, CultureInfo culture)

 

Boolean EndsWith(String value, StringComparison comparisonType)

Boolean EndsWith(String value, Boolean ignoreCase, CultureInfo culture)

 

When sorting, you should always perform case-sensitive comparisons. The reason is that if two strings differing only by case are considered to be equal, they could be ordered differently each time you sort them; this would confuse the user.

The comparisonType argument (in most of the preceding methods) is one of the values defined

by the StringComparison enumerated type, which is defined as follows.

 

public enum StringComparison { CurrentCulture = 0,

CurrentCultureIgnoreCase = 1,

InvariantCulture = 2,

InvariantCultureIgnoreCase = 3,

Ordinal = 4,

OrdinalIgnoreCase = 5

}

 

The options argument (in two of the preceding methods) is one of the values defined by the

CompareOptions enumerator type.

 

[Flags]

public enum CompareOptions { None = 0,

IgnoreCase = 1,

IgnoreNonSpace = 2,

IgnoreSymbols = 4,

IgnoreKanaType = 8, IgnoreWidth = 0x00000010, Ordinal = 0x40000000,

OrdinalIgnoreCase = 0x10000000, StringSort = 0x20000000

}

 

Methods that accept a CompareOptions argument also force you to explicitly pass in a culture. When passing in the Ordinal or OrdinalIgnoreCase flag, these Compare methods ignore the specified culture.

Many programs use strings for internal programmatic purposes such as path names, file names, URLs, registry keys and values, environment variables, reflection, Extensible Markup Lan- guage (XML) tags, XML attributes, and so on. Often, these strings are not shown to a user and are used only within the program. When comparing programmatic strings, you should always use


StringComparison.Ordinal or StringComparison.OrdinalIgnoreCase. This is the fastest way to perform a comparison that is not to be affected in any linguistic way because culture information is not taken into account when performing the comparison.

On the other hand, when you want to compare strings in a linguistically correct manner (usually for display to an end user), you should use StringComparison.CurrentCulture or String­ Comparison.CurrentCultureIgnoreCase.

       
   
 
 

 

 

Sometimes, when you compare strings in a linguistically correct manner, you want to specify a specific culture rather than use a culture that is associated with the calling thread. In this case, you can use the overloads of the StartsWith, EndsWith, and Compare methods shown earlier, all of which take Boolean and CultureInfo arguments.

       
   
 
 


Now, let’s talk about how to perform linguistically correct comparisons. The .NET Framework uses the System.Globalization.CultureInfo type to represent a language/country pair (as described by the RFC 1766 standard). For example, “en-US” identifies English as written in the United States, “en- AU” identifies English as written in Australia, and “de-DE” identifies German as written in Germany.

In the CLR, every thread has two properties associated with it. Each of these properties refers to a

CultureInfo object. The two properties are:

 

CurrentUICultureThis property is used to obtain resources that are shown to an end user. It is most useful for GUI or Web Forms applications because it indicates the language that should be used when displaying UI elements such as labels and buttons. By default, when you create a thread, this thread property is set to a CultureInfo object, which identifies the language

of the Windows version the application is running on using the Win32 GetUserDefaultUI­ Language function. If you’re running a Multilingual User Interface (MUI) version of Windows, you can set this via the Regional And Language Options Control Panel Settings dialog box. On a non-MUI version of Windows, the language is determined by the localized version of the op- erating system installed (or the installed language pack) and the language is not changeable.

CurrentCultureThis property is used for everything that CurrentUICulture isn’t used for, including number and date formatting, string casing, and string comparing. When format- ting, both the language and country parts of the CultureInfo object are used. By default, when you create a thread, this thread property is set to a CultureInfo object, whose value is determined by calling the Win32 GetUserDefaultLCID method, whose value is set in the Regional And Language Control Panel applet.

For the two thread properties mentioned above, you can override the default value used by the system when a new thread gets created with AppDomain defaults by setting CultureInfo’s static DefaultThreadCurrentCulture and DefaultThreadCurrentUICulture properties.

On many computers, a thread’s CurrentUICulture and CurrentCulture properties will be set to the same CultureInfo object, which means that they both use the same language/country infor- mation. However, they can be set differently. For example: an application running in the United States could use Spanish for all of its menu items and other GUI elements while properly displaying all of the currency and date formatting for the United States. To do this, the thread’s CurrentUICulture property should be set to a CultureInfo object initialized with a language of “es” (for Spanish), while the thread’s CurrentCulture property should be set to a CultureInfo object initialized with a language/country pair of “en-US.”

Internally, a CultureInfo object has a field that refers to a System.Globalization.Compare­ Info object, which encapsulates the culture’s character-sorting table information as defined by the Unicode standard. The following code demonstrates the difference between performing an ordinal comparison and a culturally aware string comparison.

 

using System;

using System.Globalization;

 

public static class Program { public static void Main() {


String s1 = "Strasse"; String s2 = "Straße"; Boolean eq;

 

// CompareOrdinal returns nonzero.

eq = String.Compare(s1, s2, StringComparison.Ordinal) == 0; Console.WriteLine("Ordinal comparison: '{0}' {2} '{1}'", s1, s2,

eq ? "==" : "!=");

 

// Compare Strings appropriately for people

// who speak German (de) in Germany (DE) CultureInfo ci = new CultureInfo("de­DE");

 

// Compare returns zero.

eq = String.Compare(s1, s2, true, ci) == 0; Console.WriteLine("Cultural comparison: '{0}' {2} '{1}'", s1, s2,

eq ? "==" : "!=");

}

}

 

Building and running this code produces the following output.

 

Ordinal comparison: 'Strasse' != 'Straße' Cultural comparison: 'Strasse' == 'Straße'

       
   
 
 

 

In some rare circumstances, you may need to have even more control when comparing strings for equality or for sorting. This could be necessary when comparing strings consisting of Japanese char- acters. This additional control can be accessed via the CultureInfo object’s CompareInfo property. As mentioned earlier, a CompareInfo object encapsulates a culture’s character comparison tables, and there is just one CompareInfo object per culture.

When you call String’s Compare method, if the caller specifies a culture, the specified culture is used, or if no culture is specified, the value in the calling thread’s CurrentCulture property is used. Internally, the Compare method obtains the reference to the CompareInfo object for the appropriate culture and calls the Compare method of the CompareInfo object, passing along the appropriate op- tions (such as case insensitivity). Naturally, you could call the Compare method of a specific Compare­ Info object yourself if you need the additional control.

The Compare method of the CompareInfo type takes as a parameter a value from the Compare­ Options enumerated type (as shown earlier). You can OR these bit flags together to gain significantly greater control when performing string comparisons. For a complete description of these symbols, consult the .NET Framework documentation.


The following code demonstrates how important culture is to sorting strings and shows various ways of performing string comparisons.

using System; using System.Text;

using System.Windows.Forms; using System.Globalization; using System.Threading;

 

public sealed class Program { public static void Main() {

String output = String.Empty;

String[] symbol = new String[] { "<", "=", ">" }; Int32 x;

CultureInfo ci;

 

// The code below demonstrates how strings compare

// differently for different cultures. String s1 = "coté";

String s2 = "côte";

 

// Sorting strings for French in France. ci = new CultureInfo("fr­FR");

x = Math.Sign(ci.CompareInfo.Compare(s1, s2)); output += String.Format("{0} Compare: {1} {3} {2}",

ci.Name, s1, s2, symbol[x + 1]); output += Environment.NewLine;

 

// Sorting strings for Japanese in Japan. ci = new CultureInfo("ja­JP");

x = Math.Sign(ci.CompareInfo.Compare(s1, s2)); output += String.Format("{0} Compare: {1} {3} {2}",

ci.Name, s1, s2, symbol[x + 1]); output += Environment.NewLine;

 

// Sorting strings for the thread's culture ci = Thread.CurrentThread.CurrentCulture;

x = Math.Sign(ci.CompareInfo.Compare(s1, s2)); output += String.Format("{0} Compare: {1} {3} {2}",

ci.Name, s1, s2, symbol[x + 1]);

output += Environment.NewLine + Environment.NewLine;

 

// The code below demonstrates how to use CompareInfo.Compare's

// advanced options with 2 Japanese strings. One string represents

// the word "shinkansen" (the name for the Japanese high­speed

// train) in hiragana (one subtype of Japanese writing), and the

// other represents the same word in katakana (another subtype of

// Japanese writing).

s1 = ""; // ("\u3057\u3093\u304B\u3093\u305b\u3093") s2 = ""; // ("\u30b7\u30f3\u30ab\u30f3\u30bb\u30f3")

 

// Here is the result of a default comparison ci = new CultureInfo("ja­JP");

x = Math.Sign(String.Compare(s1, s2, true, ci));

output += String.Format("Simple {0} Compare: {1} {3} {2}", ci.Name, s1, s2, symbol[x + 1]);

output += Environment.NewLine;


// Here is the result of a comparison that ignores

// kana type (a type of Japanese writing)

CompareInfo compareInfo = CompareInfo.GetCompareInfo("ja­JP");

x = Math.Sign(compareInfo.Compare(s1, s2, CompareOptions.IgnoreKanaType)); output += String.Format("Advanced {0} Compare: {1} {3} {2}",

ci.Name, s1, s2, symbol[x + 1]);

 

MessageBox.Show(output, "Comparing Strings For Sorting");

}

}

       
   
 
 

 

Building and running this code produces the output shown in Figure 14-1.

 
 

FIGURE 14-1String sorting results.

 

In addition to Compare, the CompareInfo class offers the IndexOf, LastIndexOf, IsPrefix, and IsSuffix methods. Because all of these methods offer overloads that take a CompareOptions enumeration value as a parameter, they give you more control than the Compare, IndexOf, Last­ IndexOf, StartsWith, and EndsWith methods defined by the String class. Also, you should be aware that the FCL includes a System.StringComparer class that you can also use for perform- ing string comparisons. This class is useful when you want to perform the same kind of comparison repeatedly for many different strings.

 

String Interning

As I said in the preceding section, checking strings for equality is a common operation for many ap- plications—this task can hurt performance significantly. When performing an ordinal equality check, the CLR quickly tests to see if both strings have the same number of characters. If they don’t, the strings are definitely not equal; if they do, the strings might be equal, and the CLR must then compare each individual character to determine for sure. When performing a culturally aware comparison, the CLR must always compare all of the individual characters because strings of different lengths might be considered equal.


In addition, if you have several instances of the same string duplicated in memory, you’re wasting memory because strings are immutable. You’ll use memory much more efficiently if there is just one instance of the string in memory and all variables needing to refer to the string can just point to the single string object.

If your application frequently compares strings for equality by using case-sensitive, ordinal com- parisons, or if you expect to have many string objects with the same value, you can enhance perfor- mance substantially if you take advantage of the string interning mechanism in the CLR. When the CLR initializes, it creates an internal hash table in which the keys are strings and the values are refer- ences to String objects in the managed heap. Initially, the table is empty (of course). The String class offers two methods that allow you to access this internal hash table.

 

public static String Intern(String str); public static String IsInterned(String str);

 

The first method, Intern, takes a String, obtains a hash code for it, and checks the internal hash table for a match. If an identical string already exists, a reference to the already existing String object is returned. If an identical string doesn’t exist, a copy of the string is made, the copy is added to the internal hash table, and a reference to this copy is returned. If the application no longer holds a reference to the original String object, the garbage collector is able to free the memory of that string. Note that the garbage collector can’t free the strings that the internal hash table refers to be- cause the hash table holds the reference to those String objects. String objects referred to by the internal hash table can’t be freed until the AppDomain is unloaded or the process terminates.

As does the Intern method, the IsInterned method takes a String and looks it up in the internal hash table. If a matching string is in the hash table, IsInterned returns a reference to the interned string object. If a matching string isn’t in the hash table, however, IsInterned returns null; it doesn’t add the string to the hash table.

By default, when an assembly is loaded, the CLR interns all of the literal strings described in the assembly’s metadata. Microsoft learned that this hurts performance significantly due to the ad- ditional hash table lookups, so it is now possible to turn this “feature” off. If an assembly is marked with a System.Runtime.CompilerServices.CompilationRelaxationsAttribute specifying the System.Runtime.CompilerServices.CompilationRelaxations.NoStringInterning flag value, the CLR may, according to the ECMA specification, choose not to intern all of the strings defined in that assembly’s metadata. Note that, in an attempt to improve your application’s perfor- mance, the C# compiler always specifies this attribute/flag whenever you compile an assembly.

Even if an assembly has this attribute/flag specified, the CLR may choose to intern the strings, but you should not count on this. In fact, you really should never write code that relies on strings being interned unless you have written code that explicitly calls the String’s Intern method yourself.


The following code demonstrates string interning.

 

String s1 = "Hello"; String s2 = "Hello";

Console.WriteLine(Object.ReferenceEquals(s1, s2)); // Should be 'False'

 

s1 = String.Intern(s1); s2 = String.Intern(s2);

Console.WriteLine(Object.ReferenceEquals(s1, s2)); // 'True'

 

In the first call to the ReferenceEquals method, s1 refers to a "Hello" string object in the heap, and s2 refers to a different "Hello" string object in the heap. Because the references are different, False should be displayed. However, if you run this on version 4.5 of the CLR, you’ll see that True is displayed. The reason is because this version of the CLR chooses to ignore the attribute/flag emitted by the C# compiler, and the CLR interns the literal "Hello" string when the assembly is loaded into the AppDomain. This means that s1 and s2 refer to the single "Hello" string in the heap. However, as mentioned previously, you should never write code that relies on this behavior because a future version of the CLR might honor the attribute/flag and not intern the "Hello" string. In fact, version

4.5 of the CLR does honor the attribute/flag when this assembly’s code has been compiled using the

NGen.exe utility.

 

Before the second call to the ReferenceEquals method, the "Hello" string has been explicitly interned, and s1 now refers to an interned "Hello". Then by calling Intern again, s2 is set to refer to the same "Hello" string as s1. Now, when ReferenceEquals is called the second time, we are guaranteed to get a result of True regardless of whether the assembly was compiled with the at- tribute/flag.

So now, let’s look at an example to see how you can use string interning to improve performance and reduce memory usage. The NumTimesWordAppearsEquals method below takes two arguments: a word and an array of strings in which each array element refers to a single word. This method then determines how many times the specified word appears in the wordlist and returns this count.

 

private static Int32 NumTimesWordAppearsEquals(String word, String[] wordlist) { Int32 count = 0;

for (Int32 wordnum = 0; wordnum < wordlist.Length; wordnum++) { if (word.Equals(wordlist[wordnum], StringComparison.Ordinal))

count++;

}

return count;

}


As you can see, this method calls String’s Equals method, which internally compares the strings’ individual characters and checks to ensure that all characters match. This comparison can be slow. In addition, the wordlist array might have multiple entries that refer to multiple String objects contain- ing the same set of characters. This means that multiple identical strings might exist in the heap and are surviving ongoing garbage collections.

Now, let’s look at a version of this method that was written to take advantage of string interning.

 

private static Int32 NumTimesWordAppearsIntern(String word, String[] wordlist) {

// This method assumes that all entries in wordlist refer to interned strings. word = String.Intern(word);

Int32 count = 0;

for (Int32 wordnum = 0; wordnum < wordlist.Length; wordnum++) { if (Object.ReferenceEquals(word, wordlist[wordnum]))

count++;

}

return count;

}

 

This method interns the word and assumes that the wordlist contains references to interned strings. First, this version might be saving memory if a word appears in the wordlist multiple times because, in this version, wordlist would now contain multiple references to the same single String object in the heap. Second, this version will be faster because determining if the specified word is in the array is simply a matter of comparing pointers.

Although the NumTimesWordAppearsIntern method is faster than the NumTimesWordAppears­ Equals method, the overall performance of the application might be slower when using the Num­ TimesWordAppearsIntern method because of the time it takes to intern all of the strings when they were added to the wordlist array (code not shown). The NumTimesWordAppearsIntern method will really show its performance and memory improvement if the application needs to call the method multiple times using the same wordlist. The point of this discussion is to make it clear that string interning is useful, but it should be used with care and caution. In fact, this is why the C# compiler indicates that it doesn’t want string interning to be enabled.

 

String Pooling

When compiling source code, your compiler must process each literal string and emit the string into the managed module’s metadata. If the same literal string appears several times in your source code, emitting all of these strings into the metadata will bloat the size of the resulting file.

To remove this bloat, many compilers (including the C# compiler) write the literal string into the module’s metadata only once. All code that references the string will be modified to refer to the one string in the metadata. This ability of a compiler to merge multiple occurrences of a single string into a single instance can reduce the size of a module substantially. This process is nothing new—C/C++ compilers have been doing it for years. (Microsoft’s C/C++ compiler calls this string pooling.) Even so, string pooling is another way to improve the performance of strings and just one more piece of knowledge that you should have in your repertoire.


Examining a String’s Characters and Text Elements

Although comparing strings is useful for sorting them or for detecting equality, sometimes you need just to examine the characters within a string. The String type offers several properties and meth- ods to help you do this, including Length, Chars (an indexer in C#), GetEnumerator, ToCharArray, Contains, IndexOf, LastIndexOf, IndexOfAny, and LastIndexOfAny.

In reality, a System.Char represents a single 16-bit Unicode code value that doesn’t necessarily equate to an abstract Unicode character. For example, some abstract Unicode characters are a com- bination of two code values. When combined, the U+0625 (the Arabic letter Alef with Hamza below) and U+0650 (the Arabic Kasra) characters form a single abstract character or text element.

In addition, some Unicode text elements require more than a 16-bit value to represent them.

These text elements are represented using two 16-bit code values. The first code value is called the high surrogate, and the second code value is called the low surrogate. High surrogates have a value between U+D800 and U+DBFF, and low surrogates have a value between U+DC00 and U+DFFF. The use of surrogates allows Unicode to express more than a million different characters.

Surrogates are rarely used in the United States and Europe but are more commonly used in East Asia. To properly work with text elements, you should use the System.Globalization.StringInfo type. The easiest way to use this type is to construct an instance of it, passing its constructor a string. Then you can see how many text elements are in the string by querying the StringInfo’s Length­ InTextElements property. You can then call StringInfo’s SubstringByTextElements method to extract the text element or the number of consecutive text elements that you desire.

In addition, the StringInfo class offers a static GetTextElementEnumerator method, which acquires a System.Globalization.TextElementEnumerator object that allows you to enumerate through all of the abstract Unicode characters contained in the string. Finally, you could call String­ Info’s static ParseCombiningCharacters method to obtain an array of Int32 values. The length of the array indicates how many text elements are contained in the string. Each element of the array identifies an index into the string where the first code value for a new text element can be found.

The following code demonstrates the various ways of using the StringInfo class to manipulate a string’s text elements.

 

using System; using System.Text;

using System.Globalization; using System.Windows.Forms;

 

 

public sealed class Program { public static void Main() {

// The string below contains combining characters String s = "a\u0304\u0308bc\u0327"; SubstringByTextElements(s);

EnumTextElements(s); EnumTextElementIndexes(s);

}


private static void SubstringByTextElements(String s) { String output = String.Empty;

 

StringInfo si = new StringInfo(s);

for (Int32 element = 0; element < si.LengthInTextElements; element++) { output += String.Format(

"Text element {0} is '{1}'{2}",

element, si.SubstringByTextElements(element, 1), Environment.NewLine);

}

MessageBox.Show(output, "Result of SubstringByTextElements");

}

 

private static void EnumTextElements(String s) { String output = String.Empty;

 

TextElementEnumerator charEnum = StringInfo.GetTextElementEnumerator(s);

while (charEnum.MoveNext()) { output += String.Format(

"Character at index {0} is '{1}'{2}", charEnum.ElementIndex, charEnum.GetTextElement(), Environment.NewLine);

}

MessageBox.Show(output, "Result of GetTextElementEnumerator");

}

 

private static void EnumTextElementIndexes(String s) { String output = String.Empty;

 

Int32[] textElemIndex = StringInfo.ParseCombiningCharacters(s); for (Int32 i = 0; i < textElemIndex.Length; i++) {

output += String.Format(

"Character {0} starts at index {1}{2}",

i, textElemIndex[i], Environment.NewLine);

}

MessageBox.Show(output, "Result of ParseCombiningCharacters");

}

}

 

Building and running this code produces the message boxes shown in Figures 14-2, 14-3, and 14-4.

 
 

FIGURE 14-2Result of SubstringByTextElements.


FIGURE 14-3Result of GetTextElementEnumerator.

 

 
 

FIGURE 14-4Result of ParseCombiningCharacters.

 

 

Other String Operations

The String type also offers methods that allow you to copy a string or parts of it. Table 14-1 sum- marizes these methods.

 

TABLE 14-1Methods for Copying Strings

 

Member Method Type Description
Clone Instance Returns a reference to the same object (this). This is OK because String objects are immutable. This method implements String’s ICloneable interface.
Copy Static Returns a new duplicate string of the specified string. This method is rarely used and exists to help applications that treat strings as tokens. Normally, strings with the same set of characters are interned to a single string. This method creates a new string object so that the references (pointers) are dif- ferent even though the strings contain the same characters.
CopyTo Instance Copies a portion of the string’s characters to an array of characters.
Substring Instance Returns a new string that represents a portion of the original string.
ToString Instance Returns a reference to the same object (this).

 

In addition to these methods, String offers many static and instance methods that manipulate a string, such as Insert, Remove, PadLeft, Replace, Split, Join, ToLower, ToUpper, Trim, Concat, Format, and so on. Again, the important thing to remember about all of these methods is that they return new string objects; because strings are immutable, after they’re created, they can’t be modified (using safe code).



Date: 2016-03-03; view: 565


<== previous page | next page ==>
Nbsp;   Characters | Constructing aStringBuilder Object
doclecture.net - lectures - 2014-2024 year. Copyright infringement or personal data (0.032 sec.)