TCSE IEEE Technical Council on Software Engineering

January 2002
Home Up Contents

WCRE 2004 call for papers - see Events

 

Message from the Chair

The reverse engineering and reengineering community is spread out throughout the world. We meet yearly or bi-yearly at various conferences, yet we do not have other means to communicate other than via private email messages.  Thanks to Arie van Deursen from CWI, this is no longer the case.  Arie has set up a Wiki web site for Reengineering and some of us have already tried out the benefits of this approach. 

The Reenginering Wiki site is a repository of information of interest to researchers and practitioners in this area. The site provides a self-managed way of allowing users to keep the site up-to-date. The OneMinuteWiki link states the simple rules for learning Wiki way of thinking; that way, you too can create YourOwnPersonalWikiPage and add content to the existing Wiki pages or create new ones. OneMinuteWiki also tells you how to get started with adding new Wiki pages to this site. 

Statistics on the number of times each page is accessed throughout a month are available through the WebStatistics link. The honors for the months of November and December, with over 1000 hits each, go to the DeCompilation Wiki pages, which were ported by Mike Van Emmerik to this new home. Having been the former maintainer of the Decompilation Site and having become the bottleneck for changes required to the page, I saw the benefits of the Wiki approach very quickly -- now users get to benefit from up-to-date information in DeCompilation and there is no bottleneck anywhere, not even a maintainer. Another popular page is the ReengineeringWiki. 

The Reengineering Wiki site is integrated with a wiki devoted to program transformation which is hosted by the University of Utrecht (The Netherlands), under supervision of Eelco Visser.

The TCSE Reverse Engineering and Reengineering committee has adopted the Reengineering Wiki as one of its projects, and has included a link to the Wiki from its web site.

The Reengineering Wiki may well serve as a communication channel for the other TCSE projects, such as the resource repository and taxonomy projects. Eventually, the Wiki may replace our existing resource repository and taxonomy project pages, as this and more information is already part of the Wiki pages.

My thanks to Arie van Deursen for his time and effort in putting together the initial set of Wiki pages and for providing a means for us in the community to get together and share information. 

I encourage you all to check this site and contribute information whenever possible. The Reengineering Wiki site will become the de facto site for finding information about reengineering and affiliated areas of endeavor. For more information on The Reengineering Wiki, check out the full paper as found in the proceedings of CSMR 2002, the European Conference on Software Maintenance and Reengineering, published by IEEE.

Cristina Cifuentes
Sun Microsystems Laboratories
17 December 2001

Editorial

The USAF Software Technology Support Center (STSC) is an organization that I once worked for, and was the place that I first got interested in software maintenance issues and the applicability of software reengineering techniques/tools.  I'm not sure if there are any full time personnel at the STSC still working on software reengineering (or even software maintenance) issues.  But they still put out a pretty good monthly periodical called CrossTalk - The Journal of Defense Software Engineering (not to mention a pretty good annual conference too!)  The December 2001 issue of CrossTalk was completely devoted to articles about maintaining and upgrading software legacy systems.  Therefore, most of the articles are about software reengineering:

bullet"Reengineering: An Affordable Approach for Embedded Software Upgrade" by Michael V. Del Principe, Jonathan D. Preston, and Dr. Ben A. Calloni (Lockheed Martin Aeronautics Company)
bullet "The IULS Approach to Software Wrapper Technology for Upgrading Legacy Systems" by Dr. David Corman (The Boeing Company)
bullet"A COTS-Based Replacement Strategy for Aging Avionics Computers" by Jahn A. Luke (USAF Research Laboratory/Information Directorate), Douglas G. Haldeman and William J. Cannon (TRW Avionics System Division/Avionics Engineering Centers)
bullet"Automated Transformation of Legacy Systems" by Philip Newcomb and Randy A. Doblar (The Software Revolution, Inc.)

Notice how all of these articles come from companies that manufacture airplanes.  Even the last article (which is primarily a reprint of the article published in the September 2001 issue of this website) comes from authors that used to work for The Boeing Company.  At least the people responsible for keeping our aircraft reliable are seriously interested in how software reengineering can assist in the maintenance of their software.

Workshop on Decompilation Techniques
Oct 3, 2001 in Stuttgart, Germany

By: Cristina Cifuentes, Sun Microsytems Laboratories


The half-day Workshop on Decompilation Techniques (DECOMP), sponsored by the IEEE Computer Society and co-located with the Working Conference on Reverse Engineering, was organized by Mike van Emmerik (University of Queensland) and myself. The workshop brought together 20 researchers/developers from academia and industry interested in two main areas: computer security and migration of legacy (assembly) code.

Keynote

Applications of Reverse-Engineering in the Analysis of Malicious Code (Viruses, Worms, Trojans), by Professor Klaus Brunnstein, University of Hamburg, Germany

Prof.Dr.Klaus Brunnstein is the director of the Virus Test Center at the University of Hamburg, and IFIP Vice President. He also represents the Anti-Malware community at international meetings.

The Virus Test Center researches techniques to prevent or determine viruses, as well as provide warnings/patches to the general community, not just to technology people. The Anti-Malware community deals with development of techniques and solutions to "malware" software. Malware is a generic term that describes the malicious side effects caused by computer programs, whether caused by intentionally introduced malicious code or by the malicious exploitation of benign code to the detriment of the user.

In his talk, Prof Brunnstein talked about malicous code (viruses/ worms, trojans) and mentioned that in 2001, there is an average of 1 virus in every 300 email messages; if the trend continues at the pace of the last 3 years, by 20013 there would be 1 virus in every 2 email messages (data from the Message Labs, 7/2001). He explained how the "I love you" virus, the Code-Red worm, the Nimda worm and the Vote/WTCWurm work. Simple Visual-Basic viruses are now being written by teenagers and distributed around the world, but there are other viruses that are getting very sophisticated as they use some form of encryption/obfuscation and self-modifying code.

The methods that are currently used by the anti-malware community include use of a local area network to contain the spread of the programs, use of VMWare to contain malicious events, and use of Intel-based workstations (as most viruses are in this platform at present). The types of analyses includes:
bulletHardware-based analysis: logic analyser, system tracing, using special hardware periscope to observe and control dynamic behaviour and performance monitoring,
bulletSoftware-based analysis: event tracing, interrupt observation (IntSpy), and make essential details (interrupts, BIOS, OS structures, etc) available,
bulletCode tracing: debugging (breakpoints), use of tools like SoftIce,
bulletCode analysis: use of disassemblers to understand the code, documentation of such code, manual decompilation of the code.

At the University of Hamburg there is a Reverse Engineering subject/course where students study/survey malware, practice reverse engineering of code and teaches students about the ethical and legal aspects of reverse engineering. In terms of the ethical aspects, they have the following code of ethics at VTC: Handle malware with extreme care in order to avoid any unwished side-effects. Never write a piece of malware. Never cooperate with authors or distributors of malware.

Type Recovery Session

A Simple Type System for Program Reengineering, by Ilfak Guilfanov, Data Rescue, Belgium

Ilfak Guilfanov is the designer and developer of the most widely used disassembler in the world, IDA Pro. This disassembler supports over 32 different processors, including all the main stream machines as well as .NET, it is fully interactive, allowing you to comment the disassembled code and drive the disassembly process if required, and provides graphs of programs in VCG format (an x86 port of VCG is available on their site).

Ilfak's paper describes a simple C-based type system for disassembly/decompilation purposes. The choice of C was done because most customers are interested to disassemble programs originally written in the C language as most security-related problems are found in this language. By adding type information to a disassembled program, the quality of the disassembled output is increased significantly, which enhances understanding of the program.

The type system consists of the following base types: unknown scalar type, void, integer, bool, floating point, and enumeration; and the following derived types: pointer to , array [n..m] of , function, structure, union, and typedef is .

Complex types like the function type have attributes to fully qualify the signature for the function; the attributes are: calling convention, return type, number of arguments, array of argument types, array of argument names, and argument locations.

The aim of the type system is to be able to support type information available in standard C libraries, and determine arguments to functions based on the expected type at a library call site. Type information is propagated so that some user-level functions also benefit from the type information. In Ilfak's experience, at least 30% of the program is annotated with type information from library calls and an additional 10-20% of user code due to type propagation.

Comparing Type-Based and Proof-Directed Decompilation, by Alan Mycroft, University of Cambridge, UK; Atsushi Ohori, Japan Advanced Institute of Science and Technology, Japan; and Shin-ya Katsumata, University of Edinburgh, UK

Alan Mycroft's 1999 Type-Based Decompilation paper was the first paper to address type recovery of RTL/assembler code in a formal way, by starting from an untyped language and inferring types to give descriptive types as an answer. His method used SSA to make use of unique register declarations, it then associated one or more types to each machine instruction, and used type inference to identify overall types for functions.

Atsushi Ohori and Shin-ya Katsumata's 2001 Proof-Directed Decompilation paper looked at the problem of recovering types in a Java Virtual Machine (JVM) environment, where types are prescribed by the Java language at procedure boundaries. That work is based on Ohori's sequential sequent calculus. Starting with JVM code, a system of rules describes when a sequence of instructions ending in return can be typed, each basic block ending in return is translated into a function, other basic blocks are mapped to functions ending in tailcalls to one or more functions, proofs for the inference system correspond to programs.

The comparison of these two approaches was done using a register machine variant of the Katsumata-Ohori system. The two approaches are shown to be the same, though Katsumata-Ohori's approach is more elegant from a theoretical point of view, for code with static jumps; that is, Mycroft's SSA transformations and Katsumata-Ohori's proof transformations are the same. Katsumata-Ohori's approach is more general as it can deal also with dynamic transfers of control, such as pointers and higher-order functions. Mycroft's approach provides a way to reconstruct descriptive types, which Katsumata-Ohori's approach does not.

The paper proposes how to merge both approaches to use the strengths of both, avoid doing SSA transformations, and recover types in untypes machine-code systems. An example on JVM bytecode code is also given where it is shown that the right type is not always recoverable in JVM code.

Decompilation Tools

Decompiling Java Using Staged Encapsulation, by Jerome Miecznikowski and Laurie Hendren, Mc Gill University, Canada

This paper describes the program structuring algorithm used in the Dava decompiler, which has been built at Mc Gill University based on the Soot framework. The aim of this JVM class file decompiler is to be able to recover code produced by not only Java compilers but also compilers from other languages, as these other languages present code sequences that are non "standard". They work with verifiable Java bytecode programs. The bytecode may:
bulletleave results on the Java stack between basic blocks,
bulletcontain dead code and useless branches,
bulletcontain irreducible control flow,
bullethave non-nested synchronization area, and
bulletunstructured use of exceptions.

In Dava, a program is represented by three intermediate representations: a list of typed statements, a control flow graph, and a structure encapsulation tree (SET). The SET representation is the new representation provided by this paper, where nodes represent structured Java constructs (e.g. if-then-else or while) and edges represent the strict subset relations between the body sets of each SET node. For example, the "top" node of a SET will include all nodes in the graph, the second level of the tree will contain links to SET nodes that contain sequences of nodes (instructions) or known control features (e.g. a while node with all its subnodes). The decomposition of the nodes continues until the leaves of the tree contain only sequences of nodes.

The structuring algorithm has 6 stages, a benefit of their approach is that nodes get added to the SET independently of the nesting of the control flow structures. Some of the stages are Java language specific, but the algorithm in general is generic for many other languages:
bulletSynchronized blocks: any unstructured constructs are solved by node duplication,
bulletLoop detection: recover while and do-while constructs. Some loops may be equivalent to a "while(true)" loop. Irreducible multi-entry loops are solved by node duplication,
bulletConditional jumps,
bulletExceptions,
bulletStatement sequences, and
bulletContinues, breaks and arbitrary labeled blocks.
After the SET is constructed, the tree is traversed to build an AST and do code generation of Java source code.

Computer Security Analysis through Decompilation and High-Level Debugging, by Cristina Cifuentes (SunLabs, USA), Trent Waddington and Mike Van Emmerik (University of Queensland, Australia)

This paper presents our ideas on how to build a tool that aids computer emergency teams in understanding malware code, whether it is a virus or a security-related vulnerability in a system. The state of the art involves a lot of experimentation as well as static disassembly of the code and bug tracking. Problems with static disassembly are evident these days with two techniques: those that make you disassemble tons of code (literally over 100 pages for simple programs) by use of obfuscated techniques, and those that are self-modifying/self-encrypting and so cannot be determined/seen in the code at static disassembly time.

The technique we propose is a high-level debugger of binary/executable programs which incorporates a dynamic decompiler, so that both, an assembly view and a high-level (C-like) view of a program can be seen at any point in time in the program. By dynamically disassemblying and decompiling code that reaches a particular point in a program, you do not need to deal with all the code that is never executed in that program (lots in the case of the first technique mentioned above) and you can clearly get the trace of instructions that perform the self-modification of the code in the second case mentioned above. The advantage of viewing both assembly and C-like versions of the code is that you get an order of magnitude less code in the C-like version, and it is far more readable than the assembly version.

In our proposal we, of course, propose this type of analysis to be done in a retargetable way, in fact, we have a draft prototype based on the UQBT binary translation system that branches onto higher-level analysis to produce C-like code. A suitable GUI interface with breakpoints and the like complete the picture.

Discussion

At the workshop, there were two main groups, those interested in anti-virus/anti-malware techniques, and those interested in migration of legacy assembly code to some C-like representation. Clearly, both camps can make use of the same techniques, so the discussions at the workshop were really interesting. Two companies are now starting to do work on implementing their own decompiler to provide services to particular groups, as well as people in academia. Few people were concerned about copyright law restrictions on decompilation, as both these uses of decompilation should be legal, i.e. one is for the benefit of the community and the other is translation of your own dusty assembler code to some new representation.

8th Working Conference on Reverse Engineering (WCRE)

2-5 October 2001 Stuttgart, Germany

By Peter Aiken, Virginia Commonwealth University

Marking the fourth non-North American location and the 8th conference since its founding in the early 1990's, the IEEE Working Conference on Reverse Engineering (WCRE) 2001 was held on the campus of the University of Stuttgart in the mild weather of the German fall. Most participants were able to attend the conference in spite of the terror attacks of September. Co-sponsored by the IEEE-CS TCSE Committee on Reverse Engineering and Reengineering and the industrial association Reengineering Forum, the working conference drew an international group of researchers. WCRE stayed true to its tradition of short, intense, and interactive sessions. For those who haven't attended, presentations are typically limited to two thirds of the total time - leaving a third of each session for working discussion of the issues. Many participants remarked that the high speaker/audience interaction and productivity of the discussion make this is is one of the best conference features. The inclusion of two workshops -- on Analysis, Slicing, And Transformation (AST) and On Decompilation Techniques (DECOMP) -- permitted some participants to explore topics of interest in more depth. The keynote talk was given by Dr. Peter Brössler SDS Austria titled "Software Reengineering: Experiences and Some Lessons Learned," The remainder of the program was varied, always interesting, and of consistent high caliber.

Conference attendees were able to pack much into each day including a visit to the local folks festival for lessons in bench dancing - grin! The next WCRE will be held in Richmond, VA. Information on the WCRE is available by sending email to wcre@computer.org and the WCRE proceedings are available from the IEEE Computer Society Press (cs.books@computer.org; phone +1-714-821-8380 or +1-800-CS-BOOKS).

WCRE 01 - Conference at a Glance

bulletFolding: An Approach to Support Program Understanding of Preprocessed Languages (B. Kullbach, V. Riediger)
bulletGenerating Robust Parsers using Island Grammars (L. Moonen)
bulletNode Coarsening Calculi for Program Slicing (M. Harman, R. Hierons, S. Danicic, J. Howroyd, M. Laurence, C. Fox)
bulletReverse Program Calculation Supported by Code Slicing (G. Villavicencio, J. Oliveira)
bulletTowards a standard schema for C/C++ (R. Ferenc, T. Gyimóthy, S. Sim, R. Holt, R. Koschke)
bulletUnion Schemas as a Basis for a C++ Extractor (T. Dean, A. Malton, R. Holt)
bulletInvited Presentation: Status Report on GXL (S. Sim)
bulletInvited Presentation: Status Report on DMM (T. Lethbridge)
bulletRequirements-Driven Software Re-engineering (L. Tahvildari, K. Kontogiannis, J. Mylopoulos)
bulletApplication of UML Associations and Their Adornments in Design Recovery (R. Kollmann, M. Gogolla)
bulletWrapping Legacy COBOL Programs behind an XML-Interface (H. Sneed)
bulletWrapper Development for Legacy Data Reuse (P. Thiran, J. Hainaut)
bulletModeling the System-User Dialog Using Interaction Traces (M. El-Ramly, P. Iglinski, E. Stroulia, P. Sorenson, B. Matichuk)
bulletLessons Learned in Data Reverse Engineering (K. Davis)
bulletInteractive Migration of Legacy Databases to Net-Centric Technologies (Y. Bychkov, J. Jahnke)
bulletAssisting the Comprehension of Legacy Transactions (S. Embury, J. Shao)
bulletMaximizing Functional Cohesion of Comprehension Environments by Integrating User and Task Knowledge (J. Rilling)
bulletCharacterizing the Knowledge Contained in Documentation Sources (N. Anquetil)
bulletProgram Comprehension Risks and Opportunities in Extreme Programming A. Van Deursen)
bulletCRAFT: A Framework for Evaluating Software Clustering Results in the Absence of Benchmark Decompositions (B. Mitchell, S. Mancoridis)
bulletComponent Clustering Based on Maximal Association (K. Sartapi, K. Kontogiannis)
bulletAn Investigation into the Connectivity Properties of Source-Header Dependency Graphs (G. Gannod, B. Gannod)
bulletFrom an Informal Textual Lexicon to a Well-Structured Lexical Database: A Successful Experiment in Data Reverse Engineering (G. Huet)
bulletA Retrospective on Industrial Database Reverse Engineering Projects (M. Blaha)
bulletReengineering Relational Databases to Object-Oriented: Constructing the Class Hierarchy and Migrating the Data (R. Alhajj)
bulletREportal: A Web-based Portal Site for Reverse Engineering (S. Mancoridis, T. Souder, Y. Chen, E. Gansmer, J. Korn)
bulletAn Approach for Reverse Engineering of Web-Based Applications (M. Di Penta, G. Antoniol, G. A. Di Lucca, G. Casazza)
bulletFlexible Reverse Engineering of Web Pages with VAQUISTA (J. Vanderdonckt, L. Bouillon, N. Souchon)
bulletReverse Engineering to Achieve Maintainable WWW sites (C. Boldyreff, R. Kewish)

The WCRE Charter: The Working Conference on Reverse Engineering (WCRE) is the premier research-oriented conference dedicated to the theory and practice of recovering information from existing software and systems. The purpose of WCRE is to explore innovative methods of extracting the many kinds of information that can be recovered from software, software engineering documents, and systems artifacts, and to examine innovative ways of using this information.

 

Send mail to olsemm@tacom.army.mil with questions or comments about this web site.
Last modified: May 25, 2004