OK, so you’ve decided to use the .NET Framework as your development platform. Great! Your first step is to determine what type of application or component you intend to build. Let’s just assume that you’ve completed this minor detail; everything is designed, the specifications are written, and you’re ready to start development.
Now you must decide which programming language to use. This task is usually difficult because different languages offer different capabilities. For example, in unmanaged C/C++, you have pretty low-level control of the system. You can manage memory exactly the way you want to, create threads easily if you need to, and so on. Microsoft Visual Basic 6.0, on the other hand, allows you to build UI applications very rapidly and makes it easy for you to control COM objects and databases.
The common language runtime (CLR) is just what its name says it is: a runtime that is usable by different and varied programming languages. The core features of the CLR (such as memory man- agement, assembly loading, security, exception handling, and thread synchronization) are available to any and all programming languages that target it—period. For example, the runtime uses excep-
tions to report errors, so all languages that target the runtime also get errors reported via exceptions. Another example is that the runtime also allows you to create a thread, so any language that targets the runtime can create a thread.
In fact, at runtime, the CLR has no idea which programming language the developer used for the source code. This means that you should choose whatever programming language allows you to express your intentions most easily. You can develop your code in any programming language you desire as long as the compiler you use to compile your code targets the CLR.
So, if what I say is true, what is the advantage of using one programming language over another? Well, I think of compilers as syntax checkers and “correct code” analyzers. They examine your source code, ensure that whatever you’ve written makes some sense, and then output code that describes your intention. Different programming languages allow you to develop using different syntax. Don’t underestimate the value of this choice. For mathematical or financial applications, expressing your intentions by using APL syntax can save many days of development time when compared to express- ing the same intention by using Perl syntax, for example.
Microsoft has created several language compilers that target the runtime: C++/CLI, C# (pro- nounced “C sharp”), Visual Basic, F# (pronounced “F sharp”), Iron Python, Iron Ruby, and an Inter- mediate Language (IL) Assembler. In addition to Microsoft, several other companies, colleges, and universities have created compilers that produce code to target the CLR. I’m aware of compilers for Ada, APL, Caml, COBOL, Eiffel, Forth, Fortran, Haskell, Lexico, LISP, LOGO, Lua, Mercury, ML, Mon- drian, Oberon, Pascal, Perl, PHP, Prolog, RPG, Scheme, Smalltalk, and Tcl/Tk.
Figure 1-1 shows the process of compiling source code files. As the figure shows, you can create source code files written in any programming language that supports the CLR. Then you use the cor- responding compiler to check the syntax and analyze the source code. Regardless of which compiler you use, the result is a managed module. A managed module is a standard 32-bit Windows portable executable (PE32) file or a standard 64-bit Windows portable executable (PE32+) file that requires the CLR to execute. By the way, managed assemblies always take advantage of Data Execution Prevention (DEP) and Address Space Layout Randomization (ASLR) in Windows; these two features improve the security of your whole system.
FIGURE 1-1Compiling source code into managed modules.
Table 1-1 describes the parts of a managed module.
TABLE 1-1Parts of a Managed Module
Part
Description
PE32 or PE32+ header
The standard Windows PE file header, which is similar to the Common Object File Format (COFF) header. If the header uses the PE32 format, the file can run on a 32-bit or 64-bit version of Windows. If the header uses the PE32+ format, the file requires a 64-bit ver- sion of Windows to run. This header also indicates the type of file: GUI, CUI, or DLL, and contains a time stamp indicating when the file was built. For modules that contain only IL code, the bulk of the information in the PE32(+) header is ignored. For modules that con- tain native CPU code, this header contains information about the native CPU code.
CLR header
Contains the information (interpreted by the CLR and utilities) that makes this a man- aged module. The header includes the version of the CLR required, some flags, the MethodDef metadata token of the managed module’s entry point method (Main method), and the location/size of the module’s metadata, resources, strong name, some flags, and other less interesting stuff.
Metadata
Every managed module contains metadata tables. There are two main types of tables: tables that describe the types and members defined in your source code and tables that describe the types and members referenced by your source code.
IL code
Code the compiler produced as it compiled the source code. At run time, the CLR com- piles the IL into native CPU instructions.
Native code compilers produce code targeted to a specific CPU architecture, such as x86, x64, or ARM. All CLR-compliant compilers produce IL code instead. (I’ll go into more detail about IL code later in this chapter.) IL code is sometimes referred to as managed code because the CLR manages its execution.
In addition to emitting IL, every compiler targeting the CLR is required to emit full metadata into every managed module. In brief, metadata is a set of data tables that describe what is defined in the module, such as types and their members. In addition, metadata also has tables indicating what the managed module references, such as imported types and their members. Metadata is a superset of older technologies such as COM’s Type Libraries and Interface Definition Language (IDL) files. The important thing to note is that CLR metadata is far more complete. And, unlike Type Libraries and IDL, metadata is always associated with the file that contains the IL code. In fact, the metadata is always embedded in the same EXE/DLL as the code, making it impossible to separate the two. Because the compiler produces the metadata and the code at the same time and binds them into the resulting managed module, the metadata and the IL code it describes are never out of sync with one another.
Metadata has many uses. Here are some of them:
■ Metadata removes the need for native C/C++ header and library files when compiling because all the information about the referenced types/members is contained in the file that has the
IL that implements the type/members. Compilers can read metadata directly from managed modules.
■ Microsoft Visual Studio uses metadata to help you write code. Its IntelliSense feature parses metadata to tell you what methods, properties, events, and fields a type offers, and in the case of a method, what parameters the method expects.
■ The CLR’s code verification process uses metadata to ensure that your code performs only “type-safe” operations. (I’ll discuss verification shortly.)
■ Metadata allows an object’s fields to be serialized into a memory block, sent to another ma- chine, and then deserialized, re-creating the object’s state on the remote machine.
■ Metadata allows the garbage collector to track the lifetime of objects. For any object, the garbage collector can determine the type of the object and, from the metadata, know which fields within that object refer to other objects.
In Chapter 2, “Building, Packaging, Deploying, and Administering Applications and Types,” I’ll de- scribe metadata in much more detail.
Microsoft’s C#, Visual Basic, F#, and the IL Assembler always produce modules that contain man- aged code (IL) and managed data (garbage-collected data types). End users must have the CLR (presently shipping as part of the .NET Framework) installed on their machine in order to execute any modules that contain managed code and/or managed data in the same way that they must have the Microsoft Foundation Class (MFC) library or Visual Basic DLLs installed to run MFC or Visual Basic 6.0 applications.
By default, Microsoft’s C++ compiler builds EXE/DLL modules that contain unmanaged (native) code and manipulate unmanaged data (native memory) at run time. These modules don’t require the CLR to execute. However, by specifying the /CLR command-line switch, the C++ compiler produces modules that contain managed code, and of course, the CLR must then be installed to execute this code. Of all of the Microsoft compilers mentioned, C++ is unique in that it is the only compiler that allows the developer to write both managed and unmanaged code and have it emitted into a single module. It is also the only Microsoft compiler that allows developers to define both managed and unmanaged data types in their source code. The flexibility provided by Microsoft’s C++ compiler is unparalleled by other compilers because it allows developers to use their existing native C/C++ code from managed code and to start integrating the use of managed types as they see fit.