Now we know what kind of PE file weve created. But what exactly is in the Program.exe file? A man- aged PE file has four main parts: the PE32(+) header, the CLR header, the metadata, and the IL. The PE32(+) header is the standard information that Windows expects. The CLR header is a small block of information that is specific to modules that require the CLR (managed modules). The header includes the major and minor version number of the CLR that the module was built for: some flags,
a MethodDef token (described later) indicating the modules entry point method if this module is a CUI, GUI, or Windows Store executable, and an optional strong-name digital signature (discussed in Chapter 3). Finally, the header contains the size and offsets of certain metadata tables contained within the module. You can see the exact format of the CLR header by examining the IMAGE_COR20_HEADER defined in the CorHdr.h header file.
The metadata is a block of binary data that consists of several tables. There are three categories of tables: definition tables, reference tables, and manifest tables. Table 2-1 describes some of the more common definition tables that exist in a modules metadata block.
TABLE 2-1Common Definition Metadata Tables
Metadata Definition Table Name
Description
ModuleDef
Always contains one entry that identifies the module. The entry includes the modules file name and extension (without path) and a module ver- sion ID (in the form of a GUID created by the compiler). This allows the file to be renamed while keeping a record of its original name. However, renaming a file is strongly discouraged and can prevent the CLR from locating an assembly at run time, so dont do this.
TypeDef
Contains one entry for each type defined in the module. Each entry includes the types name, base type, and flags (public, private, etc.) and contains indexes to the methods it owns in the MethodDef table, the fields it owns in the FieldDef table, the properties it owns in the PropertyDef table, and the events it owns in the EventDef table.
MethodDef
Contains one entry for each method defined in the module. Each entry includes the methods name, flags (private, public, virtual, abstract, static, final, etc.), signature, and offset within the module where its IL code can be found. Each entry can also refer to a ParamDef table entry in which more information about the methods parameters can be found.
FieldDef
Contains one entry for every field defined in the module. Each entry in- cludes flags (private, public, etc.), type, and name.
ParamDef
Contains one entry for each parameter defined in the module. Each entry includes flags (in, out, retval, etc.), type, and name.
PropertyDef
Contains one entry for each property defined in the module. Each entry includes flags, type, and name.
EventDef
Contains one entry for each event defined in the module. Each entry includes flags and name.
As the compiler compiles your source code, everything your code defines causes an entry to be created in one of the tables described in Table 2-1. Metadata table entries are also created as the compiler detects the types, fields, methods, properties, and events that the source code references. The metadata created includes a set of reference tables that keep a record of the referenced items. Table 2-2 shows some of the more common reference metadata tables.
TABLE 2-2Common Reference Metadata Tables
Metadata Reference Table Name
Description
AssemblyRef
Contains one entry for each assembly referenced by the module. Each entry includes the information necessary to bind to the assembly: the assemblys name (without path and extension), version number, culture, and public key token (normally a small hash value generated from the publishers public key, identifying the referenced assemblys publisher). Each entry also contains some flags and a hash value. This hash value was intended to be a checksum of the referenced assemblys bits. The CLR completely ignores this hash value and will probably continue to do so in the future.
ModuleRef
Contains one entry for each PE module that implements types referenced by this module. Each entry includes the modules file name and extension (without path). This table is used to bind to types that are implemented in different modules of the calling assemblys module.
Metadata Reference Table Name
Description
TypeRef
Contains one entry for each type referenced by the module. Each en- try includes the types name and a reference to where the type can be found. If the type is implemented within another type, the reference will indicate a TypeRef entry. If the type is implemented in the same module,
the reference will indicate a ModuleDef entry. If the type is implemented in another module within the calling assembly, the reference will indicate a ModuleRef entry. If the type is implemented in a different assembly, the reference will indicate an AssemblyRef entry.
MemberRef
Contains one entry for each member (fields and methods, as well as property and event methods) referenced by the module. Each entry in- cludes the members name and signature and points to the TypeRef entry for the type that defines the member.
There are many more tables than what I listed in Tables 2-1 and 2-2, but I just wanted to give you a sense of the kind of information that the compiler emits to produce the metadata information. Earlier I mentioned that there is also a set of manifest metadata tables; Ill discuss these a little later in the chapter.
Various tools allow you to examine the metadata within a managed PE file. One that I still use frequently is ILDasm.exe, the IL Disassembler. To see the metadata tables, execute the following command line.
ILDasm Program.exe
This causes ILDasm.exe to run, loading the Program.exe assembly. To see the metadata in a nice, human-readable form, select the View/MetaInfo/Show! menu item (or press Ctrl+M). This causes the following information to appear.
Fortunately, ILDasm processes the metadata tables and combines information where appropriate so that you dont have to parse the raw table information. For example, in the preceding dump, you see that when ILDasm shows a TypeDef entry, the corresponding member definition information is shown with it before the first TypeRef entry is displayed.
You dont need to fully understand everything you see here. The important thing to remember is that Program.exe contains a TypeDef whose name is Program. This type identifies a public sealed class that is derived from System.Object (a type referenced from another assembly). The Program type also defines two methods: Main and .ctor (a constructor).
Main is a public, static method whose code is IL (as opposed to native CPU code, such as x86).
Main has a void return type and takes no arguments. The constructor method (always shown with a name of .ctor) is public, and its code is also IL. The constructor has a void return type, has no argu- ments, and has a this pointer, which refers to the objects memory that is to be constructed when the method is called.
I strongly encourage you to experiment with using ILDasm. It can show you a wealth of informa- tion, and the more you understand what youre seeing, the better youll understand the CLR and its capabilities. As youll see, Ill use ILDasm quite a bit more in this book.
Just for fun, lets look at some statistics about the Program.exe assembly. When you select the ILDasms View/Statistics menu item, the following information is displayed.
File size
PE header size
: 3584
: 512 (496 used)
(14.29%)
PE additional info Num.of PE sections CLR header size
Here you can see the size (in bytes) of the file and the size (in bytes and percentages) of the various parts that make up the file. For this very small Program.cs application, the PE header and the metadata occupy the bulk of the files size. In fact, the IL code occupies just 20 bytes. Of course, as an application grows, it will reuse most of its types and references to other types and assemblies, causing the metadata and header information to shrink considerably as compared to the overall size of the file.