This is one of my initial blogs on CLR Overview and Basics, which I believe every .NET developer must know. I believe this topic is one of the prerequisite for starting anything related to .NET, it may be Console Application, Web page or an Application on Windows Phone. To start with I will tried to give you a broad overview of Common Language Runtime(CLR).
The Common Language Runtime (CLR)
is a runtime and provides an environment for a programming language that targets it. CLR has no idea which programming language the developer used for the source code. A developer can write code in any .NET language that target the CLR, it may be C# or VB or F# or C++/CLI etc. Compiler acts as syntax verifiers and does code analysis, this allows developers to code in their desired .NET languages and makes it easier to express one’s idea and develop software easily.
Regardless of which compiler is used the result is a managed module. A managed module is a standard 32 bit Windows PE32 file or a standard 64 bit Windows (PE32+) file that require CLR to execute. Managed Assemblies always take advantage of Data Execution Prevention (DEP) and Address Space Layout Randomization(ASLR) in Windows, These two are security features of .NET Framework.
Table 1-1 Parts of Managed Module
All CLR compilers generate IL Code, every compiler emits full metadata into every managed module. Metadata is superset of COM TypeLib and Intermediate Definition Language (IDL). CLR metadata is far more complete and associated with the file containing the IL code. The metadata and IL code are embedded in the same EXE/Dll as the code making it impossible to separate the two. Because metadata and managed code are built at the same time and binds them together into resulting managed module. They are never out of sync with one another.
Metadata has many applications or benefits v.i.z:,
- Metadata removes the need for native header/library files during compilation, since all the information is available in the Assembly (PE32+) file. It also has the IL code that which implements the type and members. Compiler can comprehend the metadata directly from the managed module.
- Visual Studio uses metadata to assist the developer in writing the code, Intellisense of Visual Studio parses the metadata table to inform coder what is the property, method, events and fields or a type offers and in the case of methods, what parameters the method expects.
- CLR code verification process uses metadata to ensure that you code performs only type-safe operations.
- Metadata allows serialization of object on local machine and deserialization of the same object state on a remote machine.
- Metadata allows the garbage collector to track the lifetime of objects.
C# and the IL Assembler always produce modules that contain managed code and data. So end users must have CLR installed on their devices to execute these managed code.
C++/CLI compiler is an exception to this it builds EXE/DLL modules that contain unmanaged code and manipulate unmanaged data at runtime, by adding the /CLR switch to the compiler options the C++ compiler can produce modules that contain hybrid of managed and unmanaged code, for these modules CLR is a must for execution. C++ compiler allows developer to write both managed and unmanaged code but still emit a single module.
Merging managed Modules to an Assembly:
Fig 1.2 Integrating managed modules into single assembly
CLR works with assemblies which is logical grouping of one or more modules or resource objects. An assembly is the smallest unit of versioning, reuse and security. You can produce a single file or a multi-file assembly. An assembly is similar to what we would say Component in COM World.
Single PE32(+) is a logical grouping of files which has manifest embedded is set to metadata tables. These tables describe the files that make up the assembly with public types implementation and the resource or data files that are associated with the assembly.
If you want group of files into an assembly you will have to be aware of more tools and their command-line arguments. An assembly allows you to decompose the deployment of the files while still treating all of the files as a single collection. An assembly modules have information about referenced assemblies which makes them “self describing”. It means assembly’s immediate dependencies can be identified and verified by CLR.
How Common Language Runtime Loads:
An Assembly execution is managed by CLR, so CLR needs to be loaded first into the process. You can determine if the .NET Framework is installed on a particular machine by looking for MSCorEE.dll in the %SystemRoot%System32 directory. The existence of this file confirms that .NET framework is installed. The different versions of NET can be installed on a machine and this can be identified by looking at the following Register Key
HKEY_LOCAL_MACHINESOFTWAREMicrosoftNET Framework SetupNDP .
The .NET Framework SDK includes a command-line tool CLRViewer to view the version of the installed Runtime. If assemblies contain only type safe managed code then it should work on both 32-bit and 64-bit versions of Windows without making any source code changes. The executable will run on any machine with a version of .NET Framework installed on it. If .NET developer want to develop an assembly that works on a specific version of Windows then developer needs to use C# compiler “/platform” command-line switch. This switch allows to set whether the assembly can be executed on x86 machines with 32-bit Windows version or on X64 machines with 64-bit Windows version or on Intel Itanium machines with 64-bit Windows version. But the default value is “anycpu” which makes assembly to execute run on any version of Windows.
Depending on the /platform command line option, the compiler will generate an assembly that contains either a PE32 or PE32+ header, and the compiler will also insert the desired CPU architecture information into the header. MS ships two tools with the SDK i.e. DumpBin.exe and CorFlags,exe which can be used to examine the header information contained in a managed module.
When executing the assembly, windows determines using the file header whether to execute the application in 32-bit or 64-bit address space. An executable file with a PE32 header can run in a 32-bit or 54-bit address space, and a executable with PE32+ header requires 64-bit address space Windows also verifies the CPU architecture to confirm that the machine has the required CPU. Lastly 64-bit Windows version has a feature called WOW64 – Windows on Windows64 that allows 32-bit applications to run on it.
Table 1-2 Runtime State of Modules based on /platform switch
|Type of Managed Module||x86Windows||x64Windows||IA64 Windows|
|any-cpu||PE32/agnostic||Runs as a 32-bit application||Run as a 64-bit application||Runs as a 64-bit application|
|x86||PE32/x86||Runs as a 32-bit application||Runs as a WOW64 application||Runs as a WOW64 application|
|x64||PE32+/x64||Doesn’t run||Run as a 64-bit application||Doesn’t run|
|Itanium||PE32+/Itanium||Doesn’t run||Doesn’t run||Runs as a 64-bit application|
After Windows has examined the assembly header to determine whether to create a 32-bit process, a64-bit process, or a WOW64 process, Windows loads the x86, x64 or IA64 version of MSCorEE.dll into the process’s address space. Then process’s primary thread calls a method defined inside MSCorEE.dll. This method initializes the CLR, loads the EXE assembly and then calls its entry point method (Main). When a unmanaged application loads a managed assembly, Windows loads and initialize the CLR in order to process the code contained within the assembly.
IL is a much higher language when compared to most CPU m/c languages. It can access and manipulate object types and has instructions to create and initialize objects, call virtual methods on objects and manipulate array elements directly. LI can be written in assembly language using IL Assembler, ILAsm.exe. Microsoft also provides an IL Disassembler, ILDasm.exe
The IL assembly language allows a developer to access all of the CLR’s facilities which is hidden by other programming language which you would really wanted to use. In this scenario you can use multiple languages which CLR supports to utilize the otherwise the hidden CLR facilities, in-fact level of integration between .NET programming languages inside CLR makes mixed-language programming a biggest advantage for the developer.
To execute a method its IL code is initially converted to native CPU instructions. This is the job of the CLR’s JIT compiler.
Fig shows what happens when the first time a method is called
Just before the main method executes, the CLR detects all of the types that are reference by Main code. This causes the CLR to allocate an internal data structure that is used to manage access to the referenced types. This internal data structure contains an entry for each method defined by the Console type. Each entry holds the address where the method’s implementation can be found. When initializing this structure the CLR sets each entry to an internal, undocumented function contained inside the CLR itself I call this function JITCompiler
When Main makes its first call to WriteLine, the JITCompiler function is called. The JIT Compiler function is responsible for compiling a method’s IL code into native CPU instructions. Because the IL is being compiled “just in time” this component of the CLR is referred to as a JITter or a JIT Compiler.
The JIT Compiler function then searches the defining assembly’s metadata for the called method’s IL. JITCompiler next verifies and compiles the IL code into native CPU instructions. The native CPU instructions are saved in a dynamically allocated block of memory. Then, JITCompiler goes back to the entry for the called method in the type’s internal data structure created by the CLR and replaces the reference that called it in the first place with the address of the block of memory containing the native CPU instructions it just compiled. Finally, the JITCompiler function jumps to the code in the memory block. When this code returns, it returns to the code in Main which continues execution as normal.
Main now calls WriteLine a second time. This time, the code for WriteLine has already been verified and compiled. so the call goes directly to the block of memory, skipping the JITCompiler function entirely. After the WriteLine method executes, it returns to main.
A performance hit is incurred only the first time a method is called. All subsequent calls to method execute at the full speed of the native code because verification and compilation to native code don’t need to be performed again.
The native CPU instructions in dynamic memory the compiled code is discarded when the application terminates. So if you run the application again the JIT compiler will have to compile the IL to native instructions again. It’s also likely that more time is spent inside the method then calling the method. The CLR’s JIT compiler optimizes the native code, it may take more time to produce the optimized code but the code will execute in less time with better performance compared to non-optimized code.
The two C# compiler switches that impact code optimization /optimize and /debug. The following table shows the impact of code performance based the two switches.
- Compiler Switch Settings C# IL Code Quality JIT Native Code Quality
- /optimize- /debug- Unoptimized Optimized
- /optimize- /debug(+/full/pdbonly) Unoptimized Unoptimized
- /optimize+ /debug(-/+/full/pdbonly) Optimized Optimized
The unoptimized IL code contains many no-operation instructions and also branches that jump to the next line of code, these unoptimized code instructions are generated to enable edit-and-continue feature of Visual Studio while debugging and enable applying break points to the code.
When producing optimized IL code the C# compiler will remove these extraneous NOP and branch instructions, making the code harder to single-step through in a debugger as control flow will be optimized. Furthermore, the compiler produces a Program Database (PDB) file only if specify the /debug(+/full/pdbonly) switch. The PDB file helps the debugger find local variables and map the IL instructions to source code. The /debug:full switch tells the JIT compiler will track what native code came from each IL instruction. This allows developer to use JIT Debugger of Visual studio to connect a debugger to an already running process and debug the code easily. Without the /debug:full switch, the JIT compiler does not track the IL to native code information which makes the JIT compiler run a little faster and also uses a little less memory. If you start a process with the Visual Studio debugger, it forces the JIT Compiler to track the IL to native code information unless you off the suppress JIT Optimization On Module Load (Managed Only) option in Visual Studio. In this managed environment, compiling the code is accomplished in two phases. Initially the compiler parses over the source code, doing as much work as possible in producing IL. But IL itself must be compiled into native CPU instructions at runtime,requiring more memory and more CPU time to be allocated to complete the task.
The following are difference or comparison of managed code to unmanaged code:
- A JIT compiler can determine if the application is running on an Intel Pentium 4 CPU and produce native code that takes advantage of any special instructions offered by the Pentium 4. Usually, unmanaged applications are compiled for the lowest-common-denominator CPU and avoid using special instructions that would give the application a performance boost.
- A JIT compiler can determine when a certain test is always false on the machine that it is running on. In those cases, the native code would be fine-tuned for the host machine; the resulting code is smaller and executes faster.
- The CLR could profile the code’s execution and recompile the IL into native code while the application runs. The recompiled code could be reorganized to reduce incorrect branch predictions depending on the observed execution patterns.
NGen.exe tool compiles all of an assembly’s IL code into native code and saves the resulting native code to a file on disk. At runtime, when an assembly is loaded, the CLR automatically checks to see whether a precompiled code so that no compilation is required at runtime. the code produced by NGen.exe will not be as highly optimized as the JIT compiler-produced code.
IL and Verification:
While compiling IL into native CPU instructions, the CLR performs a process called verification. Verification examines the high-level IL code and ensures that everything the code does is safe. For e.g. verification checks that every method is called with the correct number of parameters. The managed module’s metadata includes all of the method and type information used by the verification process.
In Windows, each process has its own virtual address space. Separate address spaces are necessary because you can’t trust an application’s code. It is entirely possible that an application will read from or write to an invalid memory address. By placing each windows process in a separate address space, you gain robustness and stability;
You can run multiple managed applications in a single Windows virtual address space. Reducing the number of processes by running multiple applications in a single OS process can improve performance, require fewer resources and be just as robust as if each application had its own process.
The CLR does offer the ability to execute multiple managed applications in a single OS process. Each managed application executes in an AppDomain. Every managed EXE file will run in its own separate address space that has just the one AppDomain. A process hosting the CLR can decide to run AppDomain in a single OS process.
Safe code is code that is verifiably safe. Unsafe code is allowed to work directly with memory addresses and manipulate bytes at these addresses. This is a very powerful feature and is typically useful when interoperating with unmanaged code or when you want to improve the performance of a time-critical algorithm.
The C# compiler requires that all methods that contain unsafe code be marked with the unsafe keyword. In addition, the C# compiler requires you to compile the source code by using the /unsafe compiler switch.
JIT compiler attempts to compile an unsafe method, it checks to see if the assembly containing the method has been granted the System.Security.Permissions.SecurityPermission with System.Security.Permissions.SecurityPermissionFlag’s SkipVerification flag set. The JIT compiler will compile the unsafe code and allow it to execute. The CLR is trusting this code and is hoping the direct address and byte manipulations do not cause any harm. If the flag is not set, the JIT compiler throws either a System.InvalidProgramException or a System.Security.VerificationException preventing the method from executing. In fact, the whole application will probably terminate at this point, but at least no harm can be done.
PEVerify.exe tool examines all of an assembly’s methods and notifies you of any methods that contain unsafe code. So when you use PEVerify to check an assembly, it must be able to locate and load all referenced assemblies. Because PEVerify uses the CLR to locate the dependent assemblies, the assemblies are located using the same binding and probing rules that would normally be used when executing the assembly.
The NGen Tool
The NGen.exe tool is inserting machine code during the build process, so it is interesting in two scenarios
- Improving an application startup time: The just-in time compilation is avoided because the code will already be compiled into native code and hence improve the startup time.
- Reducing an application working set: The reason is because the NGen.exe tool compiles the IL to native code and saves the output in a separate file. This file can be memory mapped into multiple-process address spaces simultaneously, allowing the code to be shared;
When a setup program invokes nGen.exe. A new assembly file containing only this native code instead of IL code is created by NGen.exe. This new file is placed in a folder under the directory with a name like C:WindowsAssemblyNativeImages_v4.0.#####_64. The directory name includes the version of the CLR and information denoting whether the native code is compiled for x86, x64 or Itanium.
Whenever the CLR loads an assembly file, the CLR looks to see if a corresponding NGen’d native file exists. There are drawbacks to NGen’d files
- No intellectual property protection: At runtime, the CLR requires that the assemblies that contain IL and metadata be shipped. if the CLR can’t use the NGen’d file for some reason the CLR gracefully goes back to JIT compiling the assembly’s IL code which must be available.
- NGen’d files can get out of sync: When the CLR loads NGen’d file. It compares a number of characteristics about the previously compiled code and the current execution environment. Here is a partial list of characteristics that must match.
- – CLR version: this changes with patches or service packs.
- – CPU type: this changes if you upgrade your processor hardware
- – Windows OS version: this changes with a new service pack update
- – Assembly’s identity module version ID (MVID): this changes when recompiling.
- – Referenced assembly’s version IDs: this changes when you recompile a referenced assembly
- – Security : this changes when you revoke permission such as SkipVerification or UnmanagedCode that were once granted.
- Whenever an end user installs a new service pack of the .NET framework the service pack’s installation program will run NGen.exe in update mode automatically so that NGen’d files are kept in sync with the version of the CLR installed.
- Inferior execution-time performance: NGen can’t make as many assumptions about the execution environment as the JIT compiler can. This causes NGen.exe to produce inferior code. Some NGen’d applications actually perform about 5% slower when compared to their JIT-compiled counterpart. So, if you’re considering using NGen.exe you should compare NGen’d and non-NGen’d versions to be sure that the NGen’d version doesn’t actually run slower. the reduction in working set size improves performance so using NGen can be net win.
- NGen.exe makes little or no sense because only the first client request experiences a performance hit; future client requests run at higher speed. In addition for most server applications only one instance of the code is required, so there is no working set benefit . NGen’d images cannot be shared across AppDomains so there is no benefit to NGen’ing an assembly that will be used in a cross-AppDomain scenario.
The Framework Class Library
- The Framework Class library (FCL) – is a set of DLL assemblies that contain several thousand type definition in which each type exposes some functionality
- Following are the different types of application that can be created/developed using FCL:
- Web Services
- Web Forms HTML-based applications (Web sites)
- Rich Windows GUI applications
- Rich internet Applications (RIAs)
- Windows console applications
- Windows services
- Database stored procedures
- Component Library
Below are the General Framework Class Library namespaces
Namespace Description of Contents
- System All of the basic types used by every application
- System.Data Types for communicating with database & processing data.
- System.IO Types for doing stream I/O and walking directories and files
- System.Net Types that allows for low-level network communications.
- System.Runtime.InteropServices Types that allow managed code to access unmanaged OS platform facilities such as DCOM and Win32 functions.
- System.Security Types used for protecting data and resources
- System.Text Types to work on text in different encodings.
- System.Threading Types used for asynchronous operations & synchronizing access to resources.
- System.Xml Types used for processing Extensible Markup Language schemas & data.
The Common Type System
The types are at the root of the CLR so Microsoft created a format specification – The Common Type System (CTS) that describes how types are defined and how they behave. The CTS specification states that a type can contain zero or more members
- Field: A data variable that is part of the object’s state. Fields are identified by their name and type
- Method A function that performs an operation on the object, often changing the object’s state. Methods have a name a signature and modifiers
- Property: Properties allow an implementer to validate input parameters and object state before accessing the value and/or calculating a value only when necessary. They also allow a user of the type to have simplified syntax. Finally properties allow you to create read-only or write only fields.
- Event: An event allows a notification mechanism between an object and other interested objects
The CTS also specifies the rules for type visibility and access to the members of a type. thus the CTS establishes the rules by which assemblies form a boundary of visibility for a type and the CLR enforces the visibility rules
A type that is visible to a caller can further restrict the ability of the caller to access the type’s members. The following list shows the valid options for controlling access to a member:
Private : The member is accessible only by other members in the same class type
Family : The member is accessible by derived types regardless of whether they are within the same assembly.
Family and assembly The member is accessible by derived types but only if the derived type is defined in the same assembly.
Assembly: The member is accessible by any code in the same assembly Many languages refer to assembly as internal.
Family or assembly: The member is accessible by derived types in any assembly. C# refers to family or assembly as protected internal.
Public : The member is accessible by any code in any assembly.
The CTS defines the rules governing type inheritance, virtual methods, object lifetime and so on. And it will map the language specific syntax into IL, the “language” of the CLR, when it emits the assembly during compilation. The CTS allows a type to derive from only one base class. To help the developer Microsoft’s C++/CLI compiler reports an error if it detects that you are attempting to create managed code that includes a type deriving from multiple base types.
All types must inherit from a predefined type: System Object. This object is the root of all other types and therefore guarantees that every type instance has a minimum set of behaviours. Specifically the System.Object type allows you do the following:
– compare two instances for equality
– Obtain a hash code for the instance
– Query the true type of an instance
– Perform a shallow copy of the instance
– Obtain a string representation of the instance object’s current state.
The Common Language Specification:
Microsoft has defined a Common Language Specification (CLS) that details for compiler vendors the minimum set of features their compiler must support if these compilers are to generate types compatible with other components written by other CLS-compliant languages on top of the CLR.
The CLS defines rules that externally visible types and methods must adhere to if they are to be accessible from any CLS-compliant programming language. Note that the CLS rules don’t apply to code that is accessible only within the defining assembly. Most other languages, such as C#, Visual Basic and Fortran expose a subset of the CLR/CTS features to the programmer. THE CLS defines the minimum set of features that all languages must support. you shouldn’t take advantage of any features that are outside of the CLS in its public and protected members. Doing so would mean that your type’s members might not be accessible by programmers writing code in other programming languages.
The [assembly:CLSCompliant(true)] attribute is applied to the assembly. This attribute tells the compiler to ensure that any publicly exposed type doesn’t have any construct that would prevent the type from being accessed from any other programming language. The reason is that the SomeLibraryTypeXX type would default to internal and would therefore no logner be exposed outside of the assembly
The table below show s how the programming language constructs got mapped to the equivalent CLR fields and methods
|Type member||Member Type||Equivalent Programming Language Construct|
|AnEvent||Field||Event the name of the field is AnEvent and its type is System.EventHandler|
|add_AnEvent||Method||Event add accessor method|
|get_Aproperty||Method||Property get accessor method|
|get_Item||Method||Indexer get accessor method|
|remove_Anevent||Method||Event_remove accessor method|
|set_Aproperty||Method||Property set accessor method|
|set_Item||Method||Indexer set accessor method.|
Interoperability with Unmanaged Code: CLR supports 3 interoperability scenarios
- – Managed code can call an unmanaged function in a DLL
- – Managed code can use an existing COM component (server)
- – Unmanaged code can use a managed type (server).