Obfuscation: Cloaking your Code from Prying Eyes

Rate this item | 0 users have rated this item.

Prevent customers from stealing your algorithms, and crackers from changing your code, with the obfuscator in VS.NET and 2003. Let's take a look at some of the ingenious techniques it uses to mask your programs intent.�

by Andrew Binstock

March 6, 2003��

Semi-compiled languages such as Java and the Microsoft Intermediate Language (MSIL) are particularly easy to disassemble or reverse engineer. Unlike native code, the intermediate byte codes contain complete variable names, such that disassembly generates almost the exact source code of the original program. The only notable absence is the comments from the original source code. Everything else is there.

For ISVs and other commercial developers who want to protect their intellectual property, this ease of disassembly poses a significant and well-known problem: Algorithms can be reconstructed and studied, and program code can be reconstituted and customized. (Even in-house, noncommercial applications are vulnerable to source-code access made possible by disassembly. For example, passwords to databases, or embedded in SQL statements are now easily accessible to users. Likewise, sites that use outside Web hosts are at risk if they upload their ASP.NET code, because staff at the hosting facility can reconstruct all the programs should they wish to.)

Moreover, the tools that hackers or even curious users might need to reverse engineer code are widely available. Microsoft offers its own MSIL disassembler, called ILDASM, at no cost. The Anakrino tool is an open-source disassembler for .NET (go to http://www.saurik.com/net/exemplar/); and various other companies offer equivalent tools on a commercial basis.

Protecting Your Code
The most effective way to protect your code from these forms of reverse engineering and snooping is to obfuscate it. ("Obfuscate" means "...to make opaque (so) as to be difficult to perceive or understand"�American Heritage Dictionary, 3rd Ed.) Tools today perform this trick by various means that primarily focus on making the variable names meaningless, encrypting strings and literals, and inserting misleading directives that render disassembled code uncompilable.

The upcoming release of Visual Studio (called VS.NET 2003 and code-named Everett) sports an integrated obfuscating tool that Microsoft suggests running as a final pass on .NET intermediate code. The obfuscator is the so-called "lite" version of a more robust obfuscating utility, Dotfuscator, sold by Preemptive Solutions, a Cleveland-based company that got its start obfuscating Java code. Dotfuscator, uses a remarkable variety of techniques to make disassembly futile or, at least, very difficult.

Overload induction is Preemptive Solutions' name for its patented technique of changing variable names in the intermediate code (Obfuscators never touch source code, nor even need to reference it.) It takes advantage of the fact that the same identifier name can be used for classes and methods with different signatures. And within different namespaces, variables can use the same name without colliding. Dotfuscator exploits these lexical features to rename as many items as possible to the letter 'A.' The company claims that on some code 33% of references can be renamed to A and another 10% to B. This transformation makes disassembled code extremely hard to understand. Consider the following example:

Disassembled code without obfuscation:


private void CalcPayroll(SpecialList employeeGroup) {
   while (employeeGroup.HasMore()) {
	   employee = employeeGroup.GetNext(true);
       employee.updateSalary();
       DistributeCheck(employee);
    }
}

Same code with obfuscation:

private void a(a b) {
    while (b.a()) {
        a = b.a(true);
        a.a();
        a(a);
    }
}

It is clear that both snippets perform the same logic. However, it is extraordinarily difficult to determine what the second snippet is doing, much less which fields and methods exactly are being accessed.

This renaming feature can be configured so that if you're building a DLL, let us say, the APIs are untouched. Interestingly, this feature alone visibly shrinks code simply by the reduction of so many variable names to just one character.

String encryption gets around a security problem that exists even in native code: String literals are easy to extract from binaries. For example, running the UNIX strings utility on any binary will generate a list of all ASCII literals in the file. In its most benign form, this list reveals only copyright information and whose libraries are included in the executable. However, if the program accesses databases, strings will reveal all the SQL commands. And if passwords are buried in the module, they are revealed as well.

With intermediate code, there are additional dangers. By examining the references to a given string, a cracker can figure out where password-protected code begins, and then can patch the file to jump there. To solve the problem of literals as human-readable text, most obfuscators encrypt strings. A small runtime penalty is incurred when the string is accessed, due to the decryption overhead. Interestingly, native code is at a disadvantage here because to achieve the same effect, developers must encrypt and decrypt each string manually, whereas an obfuscator performs this operation automatically.

Control-flow obfuscation is a technique designed to mislead disassemblers. It inserts goto statements in the code that effectively end up performing the original sequence of instructions but in a round-about way that makes it hard to follow the logic flow. Here is an example.

Disassembled intermediate code before control-flow obfuscation:

// Code Snippet copyright 2000, Microsoft Corp, from WordCount.cs
// sample app
public int CompareTo(Object o) {
  int n = occurrences - ((WordOccurrence)o).occurrences;
  if (n == 0) {
    n = String.Compare(word, ((WordOccurrence)o).word;
  }
  return (n);
}

Same code after control-flow obfuscation:

public virtual int a(object A_0) {
  int local0;
  int local1;

  local0 = this.a - (c) A_0.a;
  if (local0 != 0)
          goto i0;
      goto i1;
      while (true) {
          return local1;
          i0: local1 = local0;
      }
      i1: local0 = System.String.Compare(this.b, (c) A_0.b);
      goto i0;
}

As can be seen, a bogus test is inserted, then a goto is performed. At the goto destination, the original statement (in obfuscated form) is executed, then another goto statement returns control to the original branch in the logic flow. Notice the unexecuted and just misleading while() loop. In this small snippet, after close comparison with the original, it's possible to figure out what's real and what's not. However, on a large routine without the benefit of the source code, these misdirecting interpositions create a hugely time-consuming effort. The idea here is to make the restitution of the original coding intent so demanding that hackers will move on to other, perhaps simpler, challenges. This particular technique adds small amounts of code to the binaries and so creates some overhead for the obfuscated portions. If this is a problem, only routines that need this extra level of protection should be subject to this particular technique.

Getting your own obfuscator for .NET
As indicated previously, the upcoming VS.NET 2003 environment contains an obfuscator. It applies only the overload induction transform. For developers who are not using VS.NET, but still want access to this tool, it can be downloaded from Preemptive Solutions. To get the full complement of techniques described here, the complete professional version is available as a paid commercial product for $1495, with discounted pricing for two or more copies. Several other obfuscators for .NET MSIL are listed here.

Additional Resource

An interesting survey of all kinds of code-obfuscation techniques.

Page 1 of 1

Andrew Binstock is the principal analyst at Pacific Data Works LLC and a frequent contributor to this site. Previously he was the director of PriceWaterhouseCooperss Global Technology Forecasts. His book Practical Algorithms for Developers co-written with John Rex is in its 12th printing at Addison-Wesley and in use at more than 30 university computer-science programs. .

Don't have a login? Get one now!

Submit article to:

Learn More About the NXT Initiative

Extending your solution to run on Microsoft technology is easier than ever. Through NXT, you can reach more customers, increase revenues and slash development time and costs, accelerating both your time to market and profitability. Get the details on NTX. >>

	Windows Server 2008
	Windows Vista
	Visual Studio
	Visual Basic
	SQL Server
	Windows Mobile

	New "Innovate On" Portal
	Innovate on Windows Vista
	Innovate on 2007 Office
	Innovate on SQL Server 2005
	Microsoft On-Demand Webcasts
	Innovate on Microsoft Dynamics
	Innovate on Windows Server

Privacy Statement

	.NET Zone
	.NET Discussion Forum
	.NET Technical Forum

� Live from the Web! Bring the Windows Live Messenger Experience to Your Web Applications

The collective features of the Windows Live Messenger IM Control, the Windows Live Messenger Presence API, and the Windows Live Messenger Library go beyond the simplicity of a chat application, making it possible to embrace this new era of social networking by leveraging a built-in Windows Live network within any web application.

� Design and Use of Moveable and Resizable Graphics, Part 1

In typical modern operating systems and applications, windows are moveable and resizable; graphics and controls inside applications are not. But it doesn't have to be that way.

� Extending the Existing CLR Type

With .NET 3.0, you can extend any existing CLR type by adding one or more public methods to itwithout recompiling the library

� WPF Meets the iPhone

Find out how to add iPhone-like UI features to your WPF applications.

� Get Proper Filtered Results from a Data View

Ever notice how your data isn't filtered properly even though you've set the Row Filter Expression correctly?

Mar	APR	May
	20
2007	2008	2009