|
|
|
Message from the Chair
|
| "Reengineering: An Affordable Approach for Embedded Software Upgrade" by Michael V. Del Principe, Jonathan D. Preston, and Dr. Ben A. Calloni (Lockheed Martin Aeronautics Company) | |
| "The IULS Approach to Software Wrapper Technology for Upgrading Legacy Systems" by Dr. David Corman (The Boeing Company) | |
| "A COTS-Based Replacement Strategy for Aging Avionics Computers" by Jahn A. Luke (USAF Research Laboratory/Information Directorate), Douglas G. Haldeman and William J. Cannon (TRW Avionics System Division/Avionics Engineering Centers) | |
| "Automated Transformation of Legacy Systems" by Philip Newcomb and Randy A. Doblar (The Software Revolution, Inc.) |
Notice how all of these articles come from companies that manufacture airplanes. Even the last article (which is primarily a reprint of the article published in the September 2001 issue of this website) comes from authors that used to work for The Boeing Company. At least the people responsible for keeping our aircraft reliable are seriously interested in how software reengineering can assist in the maintenance of their software.
By: Cristina Cifuentes, Sun Microsytems Laboratories
The half-day Workshop on
Decompilation Techniques (DECOMP), sponsored by the IEEE Computer Society
and co-located with the Working
Conference on Reverse Engineering, was organized by Mike van Emmerik
(University of Queensland) and myself. The workshop brought together 20
researchers/developers from academia and industry interested in two main areas:
computer security and migration of legacy (assembly) code.
The Virus Test Center researches techniques to prevent or determine viruses, as well as provide warnings/patches to the general community, not just to technology people. The Anti-Malware community deals with development of techniques and solutions to "malware" software. Malware is a generic term that describes the malicious side effects caused by computer programs, whether caused by intentionally introduced malicious code or by the malicious exploitation of benign code to the detriment of the user.
In his talk, Prof Brunnstein talked about malicous code (viruses/ worms, trojans) and mentioned that in 2001, there is an average of 1 virus in every 300 email messages; if the trend continues at the pace of the last 3 years, by 20013 there would be 1 virus in every 2 email messages (data from the Message Labs, 7/2001). He explained how the "I love you" virus, the Code-Red worm, the Nimda worm and the Vote/WTCWurm work. Simple Visual-Basic viruses are now being written by teenagers and distributed around the world, but there are other viruses that are getting very sophisticated as they use some form of encryption/obfuscation and self-modifying code.
The methods that are currently used by the anti-malware community include use of a local area network to contain the spread of the programs, use of VMWare to contain malicious events, and use of Intel-based workstations (as most viruses are in this platform at present). The types of analyses includes:
| Hardware-based analysis: logic analyser, system tracing, using special hardware periscope to observe and control dynamic behaviour and performance monitoring, | |
| Software-based analysis: event tracing, interrupt observation (IntSpy), and make essential details (interrupts, BIOS, OS structures, etc) available, | |
| Code tracing: debugging (breakpoints), use of tools like SoftIce, | |
| Code analysis: use of disassemblers to understand the code, documentation of such code, manual decompilation of the code. |
At the University of Hamburg there is a Reverse Engineering subject/course
where students study/survey malware, practice reverse engineering of code and
teaches students about the ethical and legal aspects of reverse engineering. In
terms of the ethical aspects, they have the following code of ethics at VTC:
Handle malware with extreme care in order to avoid any unwished side-effects.
Never write a piece of malware. Never cooperate with authors or distributors of
malware.
Ilfak's paper describes a simple C-based type system for disassembly/decompilation purposes. The choice of C was done because most customers are interested to disassemble programs originally written in the C language as most security-related problems are found in this language. By adding type information to a disassembled program, the quality of the disassembled output is increased significantly, which enhances understanding of the program.
The type system consists of the following base types: unknown scalar type,
void, integer, bool, floating point, and enumeration; and the following derived
types: pointer to Complex types like the function type have attributes to fully qualify the
signature for the function; the attributes are: calling convention, return type,
number of arguments, array of argument types, array of argument names, and
argument locations.
The aim of the type system is to be able to support type information
available in standard C libraries, and determine arguments to functions based on
the expected type at a library call site. Type information is propagated so that
some user-level functions also benefit from the type information. In Ilfak's
experience, at least 30% of the program is annotated with type information from
library calls and an additional 10-20% of user code due to type propagation.
Atsushi Ohori and Shin-ya Katsumata's 2001 Proof-Directed Decompilation paper
looked at the problem of recovering types in a Java Virtual Machine (JVM)
environment, where types are prescribed by the Java language at procedure
boundaries. That work is based on Ohori's sequential sequent calculus. Starting
with JVM code, a system of rules describes when a sequence of instructions
ending in return can be typed, each basic block ending in return is translated
into a function, other basic blocks are mapped to functions ending in tailcalls
to one or more functions, proofs for the inference system correspond to
programs.
The comparison of these two approaches was done using a register machine
variant of the Katsumata-Ohori system. The two approaches are shown to be the
same, though Katsumata-Ohori's approach is more elegant from a theoretical point
of view, for code with static jumps; that is, Mycroft's SSA transformations and
Katsumata-Ohori's proof transformations are the same. Katsumata-Ohori's approach
is more general as it can deal also with dynamic transfers of control, such as
pointers and higher-order functions. Mycroft's approach provides a way to
reconstruct descriptive types, which Katsumata-Ohori's approach does not.
The paper proposes how to merge both approaches to use the strengths of both,
avoid doing SSA transformations, and recover types in untypes machine-code
systems. An example on JVM bytecode code is also given where it is shown that
the right type is not always recoverable in JVM code.
Comparing Type-Based and Proof-Directed Decompilation, by Alan Mycroft,
University of Cambridge, UK; Atsushi Ohori, Japan Advanced Institute of Science
and Technology, Japan; and Shin-ya Katsumata, University of Edinburgh, UK
Alan Mycroft's 1999 Type-Based
Decompilation paper was the first paper to address type recovery of RTL/assembler
code in a formal way, by starting from an untyped language and inferring types
to give descriptive types as an answer. His method used SSA to make use of
unique register declarations, it then associated one or more types to each
machine instruction, and used type inference to identify overall types for
functions.
Decompilation Tools
Decompiling Java Using Staged Encapsulation, by Jerome Miecznikowski and
Laurie Hendren, Mc Gill University, Canada
This paper describes the program structuring algorithm used in the Dava
decompiler, which has been built at Mc Gill University based on the Soot
framework. The aim of this JVM class file decompiler is to be able to recover
code produced by not only Java compilers but also compilers from other
languages, as these other languages present code sequences that are non
"standard". They work with verifiable Java bytecode programs. The
bytecode may:
| leave results on the Java stack between basic blocks, | |
| contain dead code and useless branches, | |
| contain irreducible control flow, | |
| have non-nested synchronization area, and | |
| unstructured use of exceptions. |
In Dava, a program is represented by three intermediate representations: a list of typed statements, a control flow graph, and a structure encapsulation tree (SET). The SET representation is the new representation provided by this paper, where nodes represent structured Java constructs (e.g. if-then-else or while) and edges represent the strict subset relations between the body sets of each SET node. For example, the "top" node of a SET will include all nodes in the graph, the second level of the tree will contain links to SET nodes that contain sequences of nodes (instructions) or known control features (e.g. a while node with all its subnodes). The decomposition of the nodes continues until the leaves of the tree contain only sequences of nodes.
The structuring algorithm has 6 stages, a benefit of their approach is that nodes get added to the SET independently of the nesting of the control flow structures. Some of the stages are Java language specific, but the algorithm in general is generic for many other languages:
| Synchronized blocks: any unstructured constructs are solved by node duplication, | |
| Loop detection: recover while and do-while constructs. Some loops may be equivalent to a "while(true)" loop. Irreducible multi-entry loops are solved by node duplication, | |
| Conditional jumps, | |
| Exceptions, | |
| Statement sequences, and | |
| Continues, breaks and arbitrary labeled blocks. |
The technique we propose is a high-level debugger of binary/executable programs which incorporates a dynamic decompiler, so that both, an assembly view and a high-level (C-like) view of a program can be seen at any point in time in the program. By dynamically disassemblying and decompiling code that reaches a particular point in a program, you do not need to deal with all the code that is never executed in that program (lots in the case of the first technique mentioned above) and you can clearly get the trace of instructions that perform the self-modification of the code in the second case mentioned above. The advantage of viewing both assembly and C-like versions of the code is that you get an order of magnitude less code in the C-like version, and it is far more readable than the assembly version.
In our proposal we, of course, propose this type of analysis to be done in a retargetable way, in fact, we have a draft prototype based on the UQBT binary translation system that branches onto higher-level analysis to produce C-like code. A suitable GUI interface with breakpoints and the like complete the picture.
By Peter Aiken, Virginia Commonwealth University
Marking the fourth non-North American location and the 8th conference since its founding in the early 1990's, the IEEE Working Conference on Reverse Engineering (WCRE) 2001 was held on the campus of the University of Stuttgart in the mild weather of the German fall. Most participants were able to attend the conference in spite of the terror attacks of September. Co-sponsored by the IEEE-CS TCSE Committee on Reverse Engineering and Reengineering and the industrial association Reengineering Forum, the working conference drew an international group of researchers. WCRE stayed true to its tradition of short, intense, and interactive sessions. For those who haven't attended, presentations are typically limited to two thirds of the total time - leaving a third of each session for working discussion of the issues. Many participants remarked that the high speaker/audience interaction and productivity of the discussion make this is is one of the best conference features. The inclusion of two workshops -- on Analysis, Slicing, And Transformation (AST) and On Decompilation Techniques (DECOMP) -- permitted some participants to explore topics of interest in more depth. The keynote talk was given by Dr. Peter Brössler SDS Austria titled "Software Reengineering: Experiences and Some Lessons Learned," The remainder of the program was varied, always interesting, and of consistent high caliber.
Conference attendees were able to pack much into each day including a visit to the local folks festival for lessons in bench dancing - grin! The next WCRE will be held in Richmond, VA. Information on the WCRE is available by sending email to wcre@computer.org and the WCRE proceedings are available from the IEEE Computer Society Press (cs.books@computer.org; phone +1-714-821-8380 or +1-800-CS-BOOKS).
| Folding: An Approach to Support Program Understanding of Preprocessed Languages (B. Kullbach, V. Riediger) | |
| Generating Robust Parsers using Island Grammars (L. Moonen) | |
| Node Coarsening Calculi for Program Slicing (M. Harman, R. Hierons, S. Danicic, J. Howroyd, M. Laurence, C. Fox) | |
| Reverse Program Calculation Supported by Code Slicing (G. Villavicencio, J. Oliveira) | |
| Towards a standard schema for C/C++ (R. Ferenc, T. Gyimóthy, S. Sim, R. Holt, R. Koschke) | |
| Union Schemas as a Basis for a C++ Extractor (T. Dean, A. Malton, R. Holt) | |
| Invited Presentation: Status Report on GXL (S. Sim) | |
| Invited Presentation: Status Report on DMM (T. Lethbridge) | |
| Requirements-Driven Software Re-engineering (L. Tahvildari, K. Kontogiannis, J. Mylopoulos) | |
| Application of UML Associations and Their Adornments in Design Recovery (R. Kollmann, M. Gogolla) | |
| Wrapping Legacy COBOL Programs behind an XML-Interface (H. Sneed) | |
| Wrapper Development for Legacy Data Reuse (P. Thiran, J. Hainaut) | |
| Modeling the System-User Dialog Using Interaction Traces (M. El-Ramly, P. Iglinski, E. Stroulia, P. Sorenson, B. Matichuk) | |
| Lessons Learned in Data Reverse Engineering (K. Davis) | |
| Interactive Migration of Legacy Databases to Net-Centric Technologies (Y. Bychkov, J. Jahnke) | |
| Assisting the Comprehension of Legacy Transactions (S. Embury, J. Shao) | |
| Maximizing Functional Cohesion of Comprehension Environments by Integrating User and Task Knowledge (J. Rilling) | |
| Characterizing the Knowledge Contained in Documentation Sources (N. Anquetil) | |
| Program Comprehension Risks and Opportunities in Extreme Programming A. Van Deursen) | |
| CRAFT: A Framework for Evaluating Software Clustering Results in the Absence of Benchmark Decompositions (B. Mitchell, S. Mancoridis) | |
| Component Clustering Based on Maximal Association (K. Sartapi, K. Kontogiannis) | |
| An Investigation into the Connectivity Properties of Source-Header Dependency Graphs (G. Gannod, B. Gannod) | |
| From an Informal Textual Lexicon to a Well-Structured Lexical Database: A Successful Experiment in Data Reverse Engineering (G. Huet) | |
| A Retrospective on Industrial Database Reverse Engineering Projects (M. Blaha) | |
| Reengineering Relational Databases to Object-Oriented: Constructing the Class Hierarchy and Migrating the Data (R. Alhajj) | |
| REportal: A Web-based Portal Site for Reverse Engineering (S. Mancoridis, T. Souder, Y. Chen, E. Gansmer, J. Korn) | |
| An Approach for Reverse Engineering of Web-Based Applications (M. Di Penta, G. Antoniol, G. A. Di Lucca, G. Casazza) | |
| Flexible Reverse Engineering of Web Pages with VAQUISTA (J. Vanderdonckt, L. Bouillon, N. Souchon) | |
| Reverse Engineering to Achieve Maintainable WWW sites (C. Boldyreff, R. Kewish) |
The WCRE Charter: The Working Conference on Reverse Engineering (WCRE) is the premier research-oriented conference dedicated to the theory and practice of recovering information from existing software and systems. The purpose of WCRE is to explore innovative methods of extracting the many kinds of information that can be recovered from software, software engineering documents, and systems artifacts, and to examine innovative ways of using this information.
|
Send mail to
olsemm@tacom.army.mil with
questions or comments about this web site.
|