Pages

Advertisement

Sunday, September 30, 2007

Reverse Engineering To Learn .NET Better

Introduction

The Microsoft .NET Framework is a new, exciting programming platform for Windows application developers (and potentially for developers on other Operating Systems, as we shall see). Right from the outset .NET has offered a large volume of functionality, both in terms of the underlying CLR and also the accompanying class library that is used by all .NET applications.

Becoming familiar with .NET can be achieved by making use of the mass of documentation in the .NET Framework SDK. Of course there is also a mass of third party documentation covering various aspects of the .NET Framework available in books and online on the Internet.

However it is sometimes said that a programmer can become most familiar with a system if they know exactly how it works. This can perhaps be best achieved if you have access to the underlying source of the system. Take, for example, C and C++ compilers and the Delphi compiler. These 3GL programming languages come supplied with the full source to their entire run-time library (RTL) as well as any class libraries they may use. Having full source code means any question as to the behaviour or implementation of any library feature can be readily resolved by looking at the pertinent source files.

Of course .NET does not ship with its source, but there are various tricks we can use in an attempt to overcome that hurdle, and get the same results as if we did have access to it. This is the remit of reverse engineering and this paper looks at various approaches that we can take in reverse engineering aspects of .NET, simply to understand its behaviour and operation better.

The Options Available To Us

Most reverse engineering options that are open to us stem from one of the key features of managed executable files: metadata. We'll look at the list of reverse engineering options and then look in detail at each of them. But first we'll go off on a slight tangent.

CLI Implementations

Note that I used the phrase managed executable files, rather than .NET executable files? This was an intentional choice and was intended to emphasise the fact that the Microsoft .NET Framework is not the only platform that can execute managed executables. You should be aware that the Microsoft .NET is one particular implementation of an ECMA (European Computer Manufacturers Association) standard.

ECMA 335 is the standard for the Common Language Infrastructure, or CLI . The Microsoft .NET Framework is one example of an implementation of the CLI (it implements the CLI as well as a whole host of additional tools and classes).

The CLI defines the fact that managed executables use the Portable Executable (PE) File Format, as used by Win32 executables. The PE files contain a standard header and then contain metadata and IL (Intermediate Language) code in special sections of the file. The IL code represents the functionality in the file, which will be compiled into native machine instructions prior to execution (usually) with the JIT (just in time) compiler. A module is an example of such a file. An assembly is one or more modules combined with additional metadata called a manifest, which names and describes the assembly, and lists assemblies it depends upon.

It therefore follows that any implementation of the CLI supports assemblies in the PE format.

At the time of writing there are five implementations of the CLI in existence or under development:

  1. Microsoft .NET Framework . This was the initial CLI implementation and it supports Windows platforms (Windows 98, Windows Me, Windows 2000, Windows XP, Windows Server 2003). It is freely downloadable in binary form (source code is not supplied). You can get just the redistributable version (suitable for deploying to machines to execute managed executables against) or the SDK (which includes additional tools, documentation and developer support). At the time of writing the current version of the .NET Framework is v1.1, which supersedes version v1.0 SP2.

  2. Microsoft .NET Compact Framework (. This is a CLI implementation for running on small devices that run Windows CE .NET . The implementation is much more lightweight than the full desktop .NET Framework and is tuned for the limited memory and storage of small devices. At the time of writing the current version of the .NET Compact Framework is v1.0 and can be used to develop application in conjunction with Visual Studio .NET 2003.

  3. Microsoft Shared Source CLI or SSCLI, codenamed Rotor . This is an implementation of the CLI (along with other parts of the Microsoft .NET Framework) that can run on multiple platforms. The supported platforms are Windows XP, FreeBSD 4.7 and Mac OS X 10.2, but it should also work fine on Windows 2000 and earlier versions of FreeBSD. You can freely download the entire source for SSCLI (over 3,000,000 lines of code) for non-commercial purposes.

  4. Mono, which is a project sponsored by and mainly developed by Ximian and runs on Linux and Windows. This is an implementation of the CLI, but which also endeavours to implement various other parts of Microsoft's .NET Framework such as ASP.NET, ADO.NET and VB.NET (called Basic.NET). It will also endeavour to get some level of support for WinForms for GUI applications. At the time of writing the current version is Mono 0.24, but version 1.0 is expected in Q4, 2003.

  5. Portable.NET from the DotGNU project . While the initial target platform was GNU/Linux, it is also known to run under Windows, Solaris, NetBSD, FreeBSD, and MacOS X. The runtime engine has been tested on the x86, PowerPC, ARM, Sparc, PARISC, s309, Alpha, and IA-64 processors. At the time of writing Portable.NET 0.5.6 was the current version.

Self-describing Assemblies

Getting back to the point made at the start of the previous section, assemblies are rich with metadata, making it easy to peer into them and identify the structure of what's inside. You can identify all the namespaces, classes, their methods, fields and properties, structures, enumerations and so on using a mechanism in .NET called reflection.

There are a variety of utilities that use reflection to show the structure and content of .NET assemblies and we'll bump into them as we look at the various reverse engineering options.

IL Examination

The only thing not directly exposed through metadata is the IL code itself. However this hurdle is easily surmountable with existing code and utilities available to read and display IL opcodes directly from a PE file.

High-level Language Decompilers

As well as being able to look at the low level IL code, which may not be palatable to many programmers, there are various tools which will decompile this IL to a high level language, such as C#, VB.NET or C++ with Managed Extensions. These tools examine the IL to identify patterns that allow them to substitute high level statements in their place.

Source Code

The various non-commercial implementations of the CLI all ship with source code. The most interesting example is Microsoft's own Shared Source CLI. This project started from the commercial .NET code base, although there have been multitudes of changes and some subsystems have been entirely reworked.

That notwithstanding, these source bases often offer a fascinating and educational insight into the working (or potential working) of the Microsoft .NET Framework. In many cases, the use of a high level language decompiler in conjunction with the SSCLI source can be a very productive pairing.

Examining The Assembly IL

Let's start off with (almost) the simplest C# application we can build, the classic Hello World application.


using System;

namespace Hello
{
public class HelloClass
{
public static void Main()
{
Console.WriteLine("Hello world");
}
}
}

After looking at results we get with this simple example, we'll turn our attention to looking at an already implemented method in the .NET Framework Class Library, namely the System.Collections.ArrayList class's InsertRange method. This method has been selected reasonably arbitrarily, although it is an example of a method that performs a reasonable amount of work.

IL Disassembler

The .NET Framework comes with a utility that takes source files containing IL code and metadata directives and compiles them into either modules or assemblies. The utility, ilasm.exe, is called the IL Assembler.

When you install the .NET Framework SDK (or Microsoft Visual Studio .NET or Borland C#Builder, which both include the Framework SDK) you get another utility called ildasm.exe, the IL Disassembler. As the name suggests this tool takes a compiled module/assembly and shows you the IL/metadata that constitutes it.

So rich is the IL code and metadata combination that you can use this utility to produce IL source code files that can be run through ilasm.exe to recreate a fully functioning binary. This process is called round tripping and emphasises the lack of ambiguity in the IL/metadata found in every managed executable.

The IL Disassembler operates in two modes, GUI and console. You can tell it which assembly to disassemble by passing the name on the command-line. If you tell it to generate IL source files using the /OUT command-line parameter it operates as a console application. It also runs as a console application if you pass the /TEXT parameter.

If you just pass the assembly name (or no parameters at all) it launches as a GUI app, which is often more convenient for browsing. When launched with no parameters you can use the File | Open menu (or Ctrl+O) to choose an assembly to disassemble, or alternatively drag a file onto the UI from Windows Explorer.

Running ildasm.exe on the simple Hello World application produces this.

As you can see, the tree view shows the assembly manifest and the simple namespace at the top level. In the namespace is our single class and within the class you can see its single static method, Main, as well as a reference to some other internal class elements. These include the instance constructor (.ctor), which is never used in this simple case as we do not construct an instance of HelloClass. The first item is some metadata to indicate the class does not require a class constructor (sometimes called a type initialiser), which is a method that automatically executes before the class is used, to initialise static fields.

Note that by default ILDasm will display all members of a class. If you only want, for example, public and protected (family) members displayed you can use the appropriate items on the View menu or invoke ildasm.exe with the /VIS=PUB+FAM command-line switch.

Double-clicking on any item in the tree view (or selecting it and pressing Enter) shows you the implementation of the item. The Main method shows up like this:

Since the original C# code was trivial, this should be readily understandable. The Hello world string is loaded onto the stack and the System.Console.WriteLine method is called, using the string on the stack as its parameter.

So we see that simple code is evidently readable enough. However a nice touch we can add is to get the disassembly to include the corresponding source code lines as comments just before the disassembled IL, assuming the assembly was compiled with debug information and the source is available. The View | Source menu item does this or you can use the /SOU command-line switch to do the job.

Now what about something a little more taxing, such as the InsertRange method of the ArrayList class? You can locate this method by running ildasm.exe on the mscorlib.dll assembly. You can find this file in the $(windir)\Microsoft.NET\Framework\$(version) directory, where windir is an environment variable that points to your Windows installation directory and version is a fictitious environment variable that equates to v1.0.3705 for .NET 1.0 and v1.1.4322 for .NET 1.1.

This screenshot shows how to locate the right class within the namespaces. First look in the System namespace, then in the nested System.Collections namespace. Then scroll down the methods until you bump into InsertRange.

Because of the fact that there are several statements in this method you will find the IL code quite verbose. The following listing shows the code:






.method public hidebysig newslot virtual 
        instance void  InsertRange(int32 index,
                                   class System.Collections.ICollection c) cil managed
{
  // Code size       228 (0xe4)
  .maxstack  6
  .locals (int32 V_0)
  IL_0000:  ldarg.2
  IL_0001:  brtrue.s   IL_0018
  IL_0003:  ldstr      "c"
  IL_0008:  ldstr      "ArgumentNull_Collection"
  IL_000d:  call       string System.Environment::GetResourceString(string)
  IL_0012:  newobj     instance void System.ArgumentNullException::.ctor(string,
                                                                         string)
  IL_0017:  throw
  IL_0018:  ldarg.1
  IL_0019:  ldc.i4.0
  IL_001a:  blt.s      IL_0025
  IL_001c:  ldarg.1
  IL_001d:  ldarg.0
  IL_001e:  ldfld      int32 System.Collections.ArrayList::_size
  IL_0023:  ble.s      IL_003a
  IL_0025:  ldstr      "index"
  IL_002a:  ldstr      "ArgumentOutOfRange_Index"
  IL_002f:  call       string System.Environment::GetResourceString(string)
  IL_0034:  newobj     instance void System.ArgumentOutOfRangeException::.ctor(string,
                                                                               string)
  IL_0039:  throw
  IL_003a:  ldarg.2
  IL_003b:  callvirt   instance int32 System.Collections.ICollection::get_Count()
  IL_0040:  stloc.0
  IL_0041:  ldloc.0
  IL_0042:  ldc.i4.0
  IL_0043:  ble        IL_00e3
  IL_0048:  ldarg.0
  IL_0049:  ldarg.0
  IL_004a:  ldfld      int32 System.Collections.ArrayList::_size
  IL_004f:  ldloc.0
  IL_0050:  add
  IL_0051:  call       instance void System.Collections.ArrayList::EnsureCapacity(int32)
  IL_0056:  ldarg.1
  IL_0057:  ldarg.0
  IL_0058:  ldfld      int32 System.Collections.ArrayList::_size
  IL_005d:  bge.s      IL_007c
  IL_005f:  ldarg.0
  IL_0060:  ldfld      object[] System.Collections.ArrayList::_items
  IL_0065:  ldarg.1
  IL_0066:  ldarg.0
  IL_0067:  ldfld      object[] System.Collections.ArrayList::_items
  IL_006c:  ldarg.1
  IL_006d:  ldloc.0
  IL_006e:  add
  IL_006f:  ldarg.0
  IL_0070:  ldfld      int32 System.Collections.ArrayList::_size
  IL_0075:  ldarg.1
  IL_0076:  sub
  IL_0077:  call       void System.Array::Copy(class System.Array,
                                               int32,
                                               class System.Array,
                                               int32,
                                               int32)
  IL_007c:  ldarg.0
  IL_007d:  ldarg.2
  IL_007e:  callvirt   instance object System.Collections.ICollection::get_SyncRoot()
  IL_0083:  bne.un.s   IL_00ba
  IL_0085:  ldarg.0
  IL_0086:  ldfld      object[] System.Collections.ArrayList::_items
  IL_008b:  ldc.i4.0
  IL_008c:  ldarg.0
  IL_008d:  ldfld      object[] System.Collections.ArrayList::_items
  IL_0092:  ldarg.1
  IL_0093:  ldarg.1
  IL_0094:  call       void System.Array::Copy(class System.Array,
                                               int32,
                                               class System.Array,
                                               int32,
                                               int32)
  IL_0099:  ldarg.0
  IL_009a:  ldfld      object[] System.Collections.ArrayList::_items
  IL_009f:  ldarg.1
  IL_00a0:  ldloc.0
  IL_00a1:  add
  IL_00a2:  ldarg.0
  IL_00a3:  ldfld      object[] System.Collections.ArrayList::_items
  IL_00a8:  ldarg.1
  IL_00a9:  ldc.i4.2
  IL_00aa:  mul
  IL_00ab:  ldarg.0
  IL_00ac:  ldfld      int32 System.Collections.ArrayList::_size
  IL_00b1:  ldarg.1
  IL_00b2:  sub
  IL_00b3:  call       void System.Array::Copy(class System.Array,
                                               int32,
                                               class System.Array,
                                               int32,
                                               int32)
  IL_00b8:  br.s       IL_00c7
  IL_00ba:  ldarg.2
  IL_00bb:  ldarg.0
  IL_00bc:  ldfld      object[] System.Collections.ArrayList::_items
  IL_00c1:  ldarg.1
  IL_00c2:  callvirt   instance void System.Collections.ICollection::CopyTo(class System.Array,
                                                                            int32)
  IL_00c7:  ldarg.0
  IL_00c8:  dup
  IL_00c9:  ldfld      int32 System.Collections.ArrayList::_size
  IL_00ce:  ldloc.0
  IL_00cf:  add
  IL_00d0:  stfld      int32 System.Collections.ArrayList::_size
  IL_00d5:  ldarg.0
  IL_00d6:  dup
  IL_00d7:  ldfld      int32 System.Collections.ArrayList::_version
  IL_00dc:  ldc.i4.1
  IL_00dd:  add
  IL_00de:  stfld      int32 System.Collections.ArrayList::_version
  IL_00e3:  ret
} // end of method ArrayList::InsertRange


Of course it's much less intelligible now we have more code, but with the metadata and IL documentation, which can be found in Partitions II and III of the CLI specification respectively or Inside Microsoft .NET IL Assembler we could still work it out. However it would take rather longer than most of us would be prepared for.

Reflector


Lutz Roeder is a developer working at Microsoft and he has his own personal Web site. There you can find a popular tool called Reflector, which is at version 3.0.0.1 at the time of writing. You can load assemblies into Reflector either using the File | Open... menu (or Ctrl+O) or by dragging them onto the UI from Windows Explorer.

Reflector does a similar job to the IL Disassembler in that it displays information about an assembly by reflecting across the metadata. However one key difference is that it only shows you public and protected items by default (you can change this in the options: View | Options...).

When you select a method it will display the method signature at the bottom of the main window in C# syntax by default, although you can switch it to show Visual Basic.NET syntax in the Languages menu. You can disassemble a method by selecting it and then choosing Tools | Disassembler (or by pressing Enter). Since the IL code in a given assembly is fixed, you will get much the same results from any tool:



However as you might be able to see, there are additional facilities in Reflector over ildasm.exe, such as the lists of base types and descendant types, a list of dependencies for the assembly, the ability to search for types in an assembly and a call tree. Also, the disassembly window explains the IL instructions using tooltips when you pause your mouse over them. These all make Reflector a useful tool to have available.

Borland Type Browser


Borland's new C#Builder development tool also includes a reflection browser and IL disassembler called Reflection.exe. By default the tool appears to simply be a reflection browser but it can be enticed into offering IL disassembly by adding a simple registry entry.

In the registry key HKEY_CURRENT_USER\Software\Borland\BDS\1.0\Globals a string value called ShowILDissassembly with a value of 1 will mean an extra Code page is displayed when looking at a method. Note carefully the spelling of ShowILDissassembly before creating the value in the registry.



The SSCLI IL Disassembler Source


The Microsoft SSCLI download is supplied as a massive source tree. It contains the source to the class libraries, the C# compiler the IL assembler and the IL disassembler and many other interesting bits and pieces. The IL disassembler is functionally identical to the commercial .NET Framework version except that it does not offer a GUI interface (SSCLI has no GUI support; it is just for building console applications).

If the SSCLI installation directory is referred to as $(ROTOR_DIR) then the ildasm.exe source is located in $(ROTOR_DIR)\clr\src\ildasm. You can peruse and learn from this source base to see how IL disassembly can be performed. If you need to build a custom IL viewer and can read C++ this would be a good place to start perusing.

You can use the Visual Studio .NET debugger to step through the SSCLI IL disassembler in order to understand how it operates, assuming you have built the SSCLI source base appropriately. If not, you will need to follow the SSCLI instructions in order to get a checked (full debug, optimisations on) or fast checked (full debug info, optimisations off) build of SSCLI first. Your best bet for debugging would be the checked build.

In principle, building SSCLI is actually straightforward thanks to the excellent configuration tools and files used in the source tree. You need to first install a Perl 5.6 implementation, for example ActivePerl (see Reference 11). You also need Visual Studio.NET installed before you can proceed. Installing SSCLI itself is simply a matter of extracting the files from the compressed tarball (i.e. a gzip compressed tar file) they are supplied in. WinZip or WinRar should do the trick.

In a command prompt window change to the SSCLI root installation directory and setup the SSCLI environment. The env.bat batch file will do this, and it takes parameters to set up for checked, fast checked or free mode. Once set up, your command prompt environment will have a variety of environment variables pointing to parts of the SSCLI directory tree, including ROTOR_DIR, which does indeed point to the main installation directory.

To set up the environment for checked mode execute:


env checked


Next you invoke the build process by executing:


buildall.cmd


As you might expect, the build process can be rather lengthy, depending on your hardware, but it should get there in the end.

With SSCLI built you will find ildasm.exe in $(TARGETCOMPLUSSDK)\bin. TARGETCOMPLUSSDK is another environment variable, which is equivalent to $(ROTOR_DIR)\build\v1.x86chk.rotor\sdk in the checked environment.

You can set up a Visual Studio.NET solution for ildasm.exe as follows:





  1. Choose File | Open Solution...





  2. Change the Files of type: entry to Executable files (*.exe)





  3. Locate the SSCLI ildasm.exe in $(TARGETCOMPLUSSDK)\bin





  4. Select the Solution Explorer with View | Solution Explorer or (Ctrl+Alt+L)





  5. Right-click the solution node and choose Properties





  6. Select Common Properties node, then the Debug Source Files node and add in the path to the ildasm source and also to the common SSCLI include files:





  7. View the project properties with Project | Properties





  8. In the Configuration Properties, Debugging node set use the Command Arguments entry to set up the ildasm.exe command-line arguments to refer to the assembly you wish to disassemble





  9. Optionally set the Working Directory option to point to the directory housing the specified assembly





  10. Save the solution using File | Save ildasm.sln (or Ctrl+S)





  11. Close Visual Studio.NET





  12. Create a batch file in the directory containing the ildasm.sln file that looks like this, substituting your SSCLI directory path as appropriate:


    REM Modify this line depending on where SSCLI is installed
    call c:\Tools\sscli\env checked

    REM Modify this line depending on where Visual Studio.NET is installed
    start C:\Tools\VS.NET\Common7\IDE\devenv.exe ildasm.sln


Now you can double-click the batch file and Visual Studio.NET will be launched in an SSCLI checked mode environment and will load your solution. You can then start debugging it as you would with your normal applications:



Lutz Roeder's IL Reader Class


Another option for writing your own IL displaying utility would be to use a helper class made available by Lutz Roeder . The ILReader class and its selection of helper classes are supplied in a C# source file accompanied by an example program that shows them in use.

The ILReader class relies on the calling program having access to the type whose methods require disassembling. The constructor takes two parameters: the target type's module and a class implementing the locally defined IAssemblyLoader interface. This interface defines two methods that the ILReader calls in order to load the assembly that implements the target type, Load and LoadFrom. The example program defines the trivial AssemblyLoader class with simple implementations.

Having constructed an ILReader you call its GetMethodBody method for any method you need to disassemble. GetMethodBody takes a MethodBase descendant (such as MethodInfo), which can be accessed through Type.GetMethod or Type.GetMethods, and returns a MethodBody object whose members provide all the information you require.



The sample program disassembles the System.Object class.

Obfuscators


There is an argument that suggests that programmers entering into the .NET world are opening their applications up to the eyes of the world. The rich metadata along with the type rich IL code allows any managed code in an assembly to be disassembled back to readable IL and there is the worry about intellectual copyright issues with your algorithms on public display.

This is true enough; managed code can indeed be disassembled. However this is not a new problem. Java byte code can also be decompiled in much the same way and this did not impede the acceptance of the Java programming language. JavaScript code in Web pages that produces nice effects can be directly read. Even standard Win32 applications can be readily disassembled by various utilities. Of course, Intel x86 machine code is not as well structured and is more troublesome to correctly and completely disassemble, but the principle holds true.

IL is easier to disassemble because of the richness of the metadata in the assembly, which is there because it makes various internal operations much easier, not because it has to be. The general advice if this is an issue to you is to make use of an obfuscator.

Microsoft Visual Studio.NET 2003 ships with an obfuscator called Dotfuscator from PreEmptive Solutions (installed into PreEmptive Solutions\Dotfuscator Community Edition directory under the main Visual Studio installation directory). Dotfuscator ships in two versions, the Professional Edition and the Community Edition. The latter is the cut-down version that ships with Visual Studio.NET 2003 and basically renames your classes and methods with unhelpful choices of names, whilst the former has many additional features and comes with a price tag. You can find out more about Dotfuscator from the links at Reference 12.

Borland C#Builder ships with a version of another obfuscator called Demeanor for .NET from Wise Owl. This product holds a good reputation in obfuscation circles and you can find out more from the link at . Demeanor is available in an Enterprise Edition, which you pay money for, or the Personal Edition, as supplied with C#Builder. The Personal Edition does simple renaming of methods and classes whilst the Enterprise Edition has additional features that make it an attractive purchase.

Reconstituting The original Source



Viewing IL is all well and good, but despite its completeness, it is still not very readable. Various tools have surfaced which allow you to reconstitute high level statements from the IL in the assembly. They analyse the IL instruction sequences and use this information to build up a representation of the original source code in a given high-level language (some tools support this decompiling process with various high level language targets).

Anakrino / Exemplar


One of the first products to offer decompilation support was Exemplar by Jay Freeman (aka saurik). Exemplar is a command-line tool, which has been superseded by Anakrino, his GUI version (anakrino is a Greek word meaning to examine). You can find these tools at the URL listed in . The current version at the time of writing is 1.0.0.1.

Anakrino doesn't (currently) support drag and drop from Windows Explorer so you must open assemblies with the File | Open... menu item or by pressing Ctrl+O. Drilling down to a method allows you to see its high-level representation in either C# (by default) or managed C++ (if you select Dialect | MC++). Here is the Main method in our trivial test case displayed in C#.



By contrast, this is what it looks like in Managed C++:



This utility is very helpful, but has a number of "quirks". If you switch language (dialect) whilst a method is displayed, the decompiled version is not updated. You must select a method then reselect the previous method to see it decompiled into the new language. Also, as you can in the screenshots above, the bottom pane (coloured light yellow) seems to be unused at present. The equivalent panel in Reflector displays the methods signature/prototype.

When you download Anakrino, it includes the command-line Exemplar tool. You can use this to disassemble our test case assembly as shown here.



Moving onto the ArrayList method, we get much more interesting results here. Anakrino produces this listing as the C# source for the InsertRange method:


public virtual void InsertRange(int index, ICollection c) {
int local0;

if (c == null)
throw new ArgumentNullException("c", Environment.GetResourceString("ArgumentNull_Collection"));
if (index < 0 || index > this._size)
throw new ArgumentOutOfRangeException("index", Environment.GetResourceString("ArgumentOutOfRange_Index"));
local0 = c.Count;
if (local0 > 0) {
this.EnsureCapacity(this._size + local0);
if (index < this._size)
Array.Copy(this._items, index, this._items, index + local0, this._size - index);
if (this == c.SyncRoot) {
Array.Copy(this._items, 0, this._items, index, index);
Array.Copy(this._items, index + local0, this._items, index * 2, this._size - index);
}
else
c.CopyTo(this._items, index);
this._size = this._size + local0;
this._version = this._version + 1;
}
}


As you can see this is much more meaningful that the pure IL code behind it. All we have lacking here is sensible names for the local variables along with any comments that might have helped us understand the intent of the code. However even without those you can see this is substantially easier to work with now.

Reflector


We have already looked at Reflector in the context of an IL disassembler but from version 3 it also offers high-level language decompilation support. The currently supported languages are C# and VB.NET (selectable in the Languages menu). Decompiling a selected method is much the same as disassembling it. Disassembly is attained by Tools | Disassembler (or Enter), whilst you decompile a method with Tools | Decompiler (or Space).

The results when decompiling the Hello World method are the same as with Anakrino. The following listing is what we get when decompiling ArrayList.InsertRange.


public virtual void InsertRange(int index, ICollection c)
{
int num1;
if (c == null)
{
throw new ArgumentNullException("c", Environment.GetResourceString("ArgumentNull_Collection"));
}
if ((index < 0) || (index > this._size))
{
throw new ArgumentOutOfRangeException("index", Environment.GetResourceString("ArgumentOutOfRange_Index"));
}
num1 = c.Count;
if (num1 > 0)
{
this.EnsureCapacity((this._size + num1));
if (index < this._size)
{
Array.Copy(this._items, index, this._items, (index + num1), (this._size - index));
}
if (this == c.SyncRoot)
{
Array.Copy(this._items, 0, this._items, index, index);
Array.Copy(this._items, (index + num1),
this._items, (index * 2), (this._size - index));
}
else
{
c.CopyTo(this._items, index);
}
this._size = (this._size + num1);
this._version = (this._version + 1);
}
}


Again, much the same as Anakrino; the only differences are subtle and include:





  • a different local variable name (num1 instead of local0)





  • different brace layout conventions; Anakroni puts the opening brace at the end of a line of code whilst Reflector puts them on their own line, in the same column as the previous line of code started





  • explicit brackets around each condition, such as index < 0



LSW DotNet-Reflection-Browser


DotNet-Reflection-Browser is a commercial utility from Lesser-Software that offers reflection browsing and decompilation support in the style of a SmallTalk System Browser (see Reference 15). It has various other features that are not relevant here; for our purposes you should know that it decompiles to C# or MSIL (i.e. it decompiles and disassembles).



Running it against the InsertRange method produces this code:


public virtual void InsertRange (int index, ICollection c)
{
if (c == null)
{
throw new ArgumentNullException ("c", Environment.GetResourceString ("ArgumentNull_Collection"));
}

if ((index < 0) || (index > this._size))
{
throw new ArgumentOutOfRangeException ("index", Environment.GetResourceString ("ArgumentOutOfRange_Index"));
}

int i = c.Count;
if (i > 0)
{
this.EnsureCapacity ((this._size + i));
if (index < this._size)
{
Array.Copy (this._items, index, this._items, (index + i), (this._size - index));
}

if (this == c.SyncRoot)
{
Array.Copy (this._items, 0, this._items, index, index);
Array.Copy (this._items, (index + i), this._items, (index * 2), (this._size - index));
}
else
{
c.CopyTo (this._items, index);
}

this._size += i;
this._version++;
}

return;
}


Again it is much the same as we have seen before, although there a couple of slight differences:





  • Local variables are declared at the latest point (just before they are required) instead of right at the start





  • Variable incrementing is represented with the ++ operator





  • Explicit return statements are always used to exit methods, even at the end of the method implementation





  • explicit brackets around each condition, such as index < 0



Alternative (And Sometimes Identical) Source Implementations



As detailed earlier there are various implementations of the CLI of which Microsoft's .NET Framework is the most well known and successful. However Microsoft's Shared Source CLI is worth paying some close attention to. This implementation started its life from the same code base that .NET has grown from and, whilst parts of it have been changed for various reasons (such as to aid porting to other platforms or to help keep certain algorithms in the domain of the Microsoft engineers), much of it is still a direct representation of code in the commercial platform.

This is, of course, not true of the other non-Microsoft implementations, but it can still be very fruitful delving into their source trees.

SSCLI


Assuming the SSCLI has been installed into a directory referred to as $(ROTOR_DIR) the source for the ArrayList class can be found in $(ROTOR_DIR)\clr\src\bcl\system\collections\arraylist.cs. The method we are looking at is InsertRange, and the source code for it looks like this:


// Inserts the elements of the given collection at a given index. If
// required, the capacity of the list is increased to twice the previous
// capacity or the new size, whichever is larger. Ranges may be added
// to the end of the list by setting index to the ArrayList's size.
//
/// <include file='doc\ArrayList.uex' path='docs/doc[@for="ArrayList.InsertRange"]/*' />
public virtual void InsertRange(int index, ICollection c) {
if (c==null)
throw new ArgumentNullException("c", Environment.GetResourceString("ArgumentNull_Collection"));
if (index < 0 || index > _size) throw new ArgumentOutOfRangeException("index", Environment.GetResourceString("ArgumentOutOfRange_Index"));
int count = c.Count;
if (count > 0) {
EnsureCapacity(_size + count);
if (index < _size) {
Array.Copy(_items, index, _items, index + count, _size - index);
}
// Hack hack hack
// If we're inserting a ArrayList into itself, we want to be able to deal with that.
if (this == c.SyncRoot) {
// Copy first part of _items to insert location
Array.Copy(_items, 0, _items, index, index);
// Copy last part of _items back to inserted location
Array.Copy(_items, index+count, _items, index*2, _size-index);
}
else
c.CopyTo(_items, index);
_size += count;
_version++;
}
}


Some key things to note about this code are:





  • it is exactly the same logic as decompiled by Anakrino, Reflector and LSW-DNRB





  • the local variable has a descriptive name





  • there are comments describing what goes on in the code



In cases where the SSCLI still uses the same code as .NET, any effort in working out the gaps left by the decompilers is completely avoided. It is usually worth decompiling with one of the tools, then comparing with the equivalent source in SSCLI to see if the shared source file can be used as a reference to the .NET implementation.

Mono


The Mono implementation is based on reading the CLI specification and implementing the described behaviour. Therefore you don't learn anything about the inner workings of .NET by browsing Mono source, but it does give you an option to see alternative ways to implement the behaviour as outlined in the ECMA CLI specification.

If the Mono 0.24 source files are installed in $(MONO) then the ArrayList source will be found in $(MONO)\mcs-0.24\class\corlib\System.Collections\ArrayList.cs. The InsertRange method implementation looks like this:


public override void InsertRange (int index, ICollection col) {
if (col == null)
throw new ArgumentNullException ();
if (index < 0 || index > Count)
throw new ArgumentOutOfRangeException ();
if (IsReadOnly || IsFixedSize)
throw new NotSupportedException ();

if (index == Count) {
foreach (object element in col)
list.Insert (index++, element);
}
}


As you can see, this is somewhat different in its approach, but does the same thing as far as the CLI class library specification goes.

Portable.NET


The situation with Portable.NET is much the same as with Mono with regard to its implementation. If Portable.NET 0.5.6 is installed in a directory referred to as $(PNET) then you can find the ArrayList source in $(PNET)\pnetlib-0.5.6\runtime\System\Collections\ArrayList.cs. The InsertRange method looks like this:


// Insert the contents of a collection as a range.

2.1191269304&ga_sid=1191269304&ga_hid=1507543749&flash=9&u_h=768&u_w=1024&u_ah=738&u_aw=1024&u_cd=32&u_tz=330&u_java=true" frameborder="0" width="250" scrolling="no" height="250" allowtransparency>


public virtual void InsertRange(int index, ICollection c)
{
int cCount;
IEnumerator enumerator;
if(c == null)
{
throw new ArgumentNullException("c");
}
if(index < 0 || index > count)
{
throw new ArgumentOutOfRangeException
("index", _("ArgRange_Array"));
}
cCount = c.Count;
Realloc(cCount, index);
enumerator = c.GetEnumerator();
while(enumerator.MoveNext())
{
store[index++] = enumerator.Current;
}
count += cCount;
++generation;
}




Reverse Engineering To Learn .NET Better

Technorati Tags: , , ,